SDK Prompt Cache Invalidation Fix — Up to 12x Input Token Cost Reduction

Claude CodeMar 10, 2026

Claude Code 2.1.72 resolves a long-standing prompt cache invalidation bug in SDK query() calls that was causing input tokens to be re-sent on every request instead of being served from cache. The fix can reduce input token costs by up to 12x for developers building on the Claude Agent SDK. This is particularly impactful for agentic workflows where long system prompts and conversation history are repeatedly included across many turns.

Sources & Mentions

5 external resources covering this update

Claude Code v2.1.62 — Server-Side KV Cache Stale Context Regression (P1)

GitHub

Bug: Prompt Caching Not Working - Rapid Token Usage Depletion

GitHub

Anthropic Just Fixed the Biggest Hidden Cost in AI Agents (Automatic Prompt Caching)

Medium

Claude Code rate limits reset after prompt caching bug drains usage faster than normal

Piunika Web

Claude Code by Anthropic - Release Notes - March 2026 Latest Updates

Releasebot

A Silent Cost Multiplier, Now Fixed

For developers building applications on top of Claude Code's SDK query() API, prompt caching is one of the most powerful levers for controlling costs — cached input tokens cost a fraction of uncached reads. However, v2.1.72 reveals that this caching was quietly broken: a prompt cache invalidation bug was causing query() calls to re-transmit the full input on every invocation rather than reading from the cache. The result was artificially inflated token bills, sometimes by as much as a factor of 12.

What Was Happening

Prompt caching works by storing portions of the input (typically the system prompt and earlier conversation turns) so that subsequent requests only pay for new tokens. When cache invalidation fires incorrectly — treating a cacheable request as if it were brand new — the entire prefix is re-sent at full cost. In long agentic sessions with large system prompts, this compounds rapidly across hundreds of turns.

The bug specifically affected the query() entry point used by SDK consumers, including Claude Code Remote and any application built on top of the agent SDK. Interactive REPL users were less exposed, but programmatic integrations bore the full brunt.

The Impact

Anthropic's fix in v2.1.72 restores correct cache behavior for SDK query() calls. For workloads that previously suffered from the bug, the correction brings input token costs back in line with what prompt caching is designed to deliver — up to a 12x reduction compared to the broken state. On a long Opus session that might otherwise cost tens of dollars in input tokens, the savings are material.

Additional Cache Improvements

The same release also improves compaction to preserve images in the summarizer request, enabling prompt cache reuse for faster and cheaper compaction passes. Together, these changes meaningfully lower the cost floor for intensive SDK-based deployments.

Mentioned onGitHub GitHub Medium Piunika Web Releasebot