claude

Claude Code Bug Breaks Cache on Billing Strings

A critical bug in Claude Code's standalone binary breaks prompt caching when conversations contain billing-related strings, causing the system to perform

Claude Code Bug Breaks Caching on Billing Topics

What It Is

A critical bug in Claude Code’s standalone binary causes prompt caching to fail catastrophically when conversations contain specific billing-related strings. The binary performs an undocumented string replacement on cch=00000 before transmitting API requests. When this exact sequence appears in chat history—common when discussing billing details or examining Claude Code’s source code—the replacement corrupts cache prefixes. Instead of reusing cached prompts, every request triggers a complete cache rebuild, multiplying API costs by 10-20x.

The replacement occurs at the native layer within a custom Bun fork, making it invisible to JavaScript debugging tools. Developers examining network traffic or API logs won’t see the corruption because it happens before encryption. This explains why some users report unexpectedly high bills despite seemingly normal usage patterns.

The issue is documented at https://github.com/anthropics/claude-code/issues/40524, where affected developers have shared their experiences with sudden cost spikes.

Why It Matters

Prompt caching represents one of the most significant cost optimizations in modern LLM applications. When functioning correctly, caching allows subsequent requests to reuse previously processed context, reducing both latency and token consumption. Breaking this mechanism doesn’t just increase costs—it fundamentally changes the economics of interactive AI development.

For teams building with Claude Code, this bug creates a perverse incentive structure. Conversations about cost optimization, billing analysis, or financial planning—precisely the topics where cost awareness matters most—become the most expensive to conduct. Developers reviewing Claude Code’s implementation details to understand its behavior inadvertently trigger the bug by examining code samples containing the sentinel string.

The hidden nature of the corruption makes diagnosis nearly impossible without insider knowledge. Standard debugging approaches fail because the string replacement happens below the application layer. Teams might attribute cost increases to usage patterns, model selection, or API changes rather than a binary-level bug. This opacity undermines trust in cost monitoring and makes capacity planning unreliable.

Getting Started

The immediate workaround avoids the problematic binary entirely. Instead of installing via the official script at https://claude.ai/install.sh, developers should run Claude Code through npx:

This command uses the standard Node.js runtime rather than the custom Bun fork containing the string replacement logic. The npx approach provides identical functionality without the caching corruption.

For teams already using the standalone binary, migration requires uninstalling the existing version and switching to the npx method. Check current installations with which claude-code and remove any binaries found in system paths. Verify the fix by monitoring API costs during conversations that previously triggered high bills—discussions about billing, code reviews of Claude Code internals, or any content containing the cch=00000 sequence.

Track the official fix progress at https://github.com/anthropics/claude-code/issues/40524. The issue tracker provides updates on patches and permanent solutions.

Context

This bug highlights broader challenges in LLM tooling transparency. Many AI development tools operate as black boxes, with proprietary binaries handling sensitive operations like API communication. When these tools modify requests invisibly, debugging becomes archaeological work rather than systematic analysis.

Alternative approaches to running Claude Code include building from source or using containerized versions where the entire execution environment remains auditable. While more complex to set up, these methods provide visibility into every layer of the stack. Teams with strict cost controls or compliance requirements might prefer this transparency over convenience.

The incident also demonstrates why prompt caching implementations need careful testing across diverse conversation content. Edge cases involving specific string patterns, especially those related to internal tooling or billing systems, rarely appear in standard test suites. Comprehensive testing should include conversations about the tool itself, meta-discussions about costs, and code samples from the tool’s own repository.

For developers building similar tools, this serves as a cautionary tale about hidden transformations in request pipelines. Any modification to user content before API transmission should be explicit, documented, and ideally configurable. Silent replacements, even for seemingly benign purposes, create debugging nightmares and erode user trust.