Claude Code Restores 1-Hour Prompt Cache TTL After Months of Cost Inflation

Claude Code

Claude Code version 2.1.129 restores the 1-hour prompt cache TTL that had been reduced in prior releases, reversing a change that significantly increased costs for users running long or repeated sessions. The cache had previously been shortened, causing token re-processing and higher API spend on every run. With this release, Anthropic reinstates the full 60-minute window, allowing cached context to persist across a wider range of workflows without re-billing. Developers relying on sub-agent patterns and multi-step pipelines should see noticeable cost relief immediately.


What Changed

Claude Code version 2.1.129 restores the prompt cache TTL to 1 hour β€” reversing a reduction that had been in place for several months and that many users experienced as a stealth cost increase.

The prompt cache is a mechanism that lets Claude reuse previously processed context instead of re-tokenizing it on every call. When the TTL is longer, cached context survives between separate runs of Claude Code, meaning users working iteratively across a project don't pay to re-process large system prompts or file contents repeatedly.

Why This Matters

The TTL reduction had a compounding effect on cost. Users running Claude Code repeatedly throughout the day β€” which is the normal workflow for anyone using sub-agents, CI pipelines, or iterative coding sessions β€” were re-paying for context on each invocation that would otherwise have been cached.

For teams running Claude Code at scale, the difference between a 5-minute cache and a 60-minute cache can be substantial. A developer who iterates on a large codebase across a 2-hour working session might make dozens of Claude calls. With a short TTL, most of those calls start cold. With the restored 1-hour TTL, the first call of each hour pays for the cache fill, and subsequent calls within that window are significantly cheaper.

Context: Why Was the TTL Reduced?

Anthropic temporarily shortened the cache TTL as part of infrastructure adjustments during a period of rapid capacity scaling. The reduction was not announced prominently, which is why many users discovered it only after noticing elevated costs. The restoration in v2.1.129 is a direct response to user feedback and monitoring data showing elevated spend patterns.

Who Is Affected

This change benefits all Claude Code users, but the impact is most pronounced for:

  • Developers running long coding sessions with large context windows
  • Teams using sub-agent workflows where multiple agents share a common system prompt
  • CI/CD integrations that invoke Claude Code repeatedly against the same codebase
  • Anyone using the 1M-token context window who had been paying full re-processing costs on each call

Practical Impact

No action is required on the user side. The 1-hour TTL is restored automatically in v2.1.129. Users who update will immediately benefit from lower costs on any session that involves multiple Claude Code invocations within a 60-minute window.