SDK Prompt Cache Invalidation Fix β Up to 12x Input Token Cost Reduction
Claude Code 2.1.72 resolves a long-standing prompt cache invalidation bug in SDK query() calls that was causing input tokens to be re-sent on every request instead of being served from cache. The fix can reduce input token costs by up to 12x for developers building on the Claude Agent SDK. This is particularly impactful for agentic workflows where long system prompts and conversation history are repeatedly included across many turns.
Sources & Mentions
5 external resources covering this update
Claude Code v2.1.62 β Server-Side KV Cache Stale Context Regression (P1)
GitHub
Bug: Prompt Caching Not Working - Rapid Token Usage Depletion
GitHub
Anthropic Just Fixed the Biggest Hidden Cost in AI Agents (Automatic Prompt Caching)
Medium
Claude Code rate limits reset after prompt caching bug drains usage faster than normal
Piunika Web
Claude Code by Anthropic - Release Notes - March 2026 Latest Updates
Releasebot
A Silent Cost Multiplier, Now Fixed
For developers building applications on top of Claude Code's SDK query() API, prompt caching is one of the most powerful levers for controlling costs β cached input tokens cost a fraction of uncached reads. However, v2.1.72 reveals that this caching was quietly broken: a prompt cache invalidation bug was causing query() calls to re-transmit the full input on every invocation rather than reading from the cache. The result was artificially inflated token bills, sometimes by as much as a factor of 12.
What Was Happening
Prompt caching works by storing portions of the input (typically the system prompt and earlier conversation turns) so that subsequent requests only pay for new tokens. When cache invalidation fires incorrectly β treating a cacheable request as if it were brand new β the entire prefix is re-sent at full cost. In long agentic sessions with large system prompts, this compounds rapidly across hundreds of turns.
The bug specifically affected the query() entry point used by SDK consumers, including Claude Code Remote and any application built on top of the agent SDK. Interactive REPL users were less exposed, but programmatic integrations bore the full brunt.
The Impact
Anthropic's fix in v2.1.72 restores correct cache behavior for SDK query() calls. For workloads that previously suffered from the bug, the correction brings input token costs back in line with what prompt caching is designed to deliver β up to a 12x reduction compared to the broken state. On a long Opus session that might otherwise cost tens of dollars in input tokens, the savings are material.
Additional Cache Improvements
The same release also improves compaction to preserve images in the summarizer request, enabling prompt cache reuse for faster and cheaper compaction passes. Together, these changes meaningfully lower the cost floor for intensive SDK-based deployments.