Claude Code Restores 1-Hour Prompt Cache TTL After Months of Cost Inflation

Claude CodeMay 5, 2026

Claude Code version 2.1.129 restores the 1-hour prompt cache TTL that had been reduced in prior releases, reversing a change that significantly increased costs for users running long or repeated sessions. The cache had previously been shortened, causing token re-processing and higher API spend on every run. With this release, Anthropic reinstates the full 60-minute window, allowing cached context to persist across a wider range of workflows without re-billing. Developers relying on sub-agent patterns and multi-step pipelines should see noticeable cost relief immediately.

Sources & Mentions

4 external resources covering this update

Claude Code 2.1.129 restores 1h prompt cache TTL

Hacker News

Claude Code just restored the 1-hour cache TTL — costs should drop for heavy users

Prompt cache TTL back to 1 hour in Claude Code 2.1.129

X (Twitter)

Claude Code changelog — prompt cache TTL restoration notes

Simon Willison's Weblog

What Changed

Claude Code version 2.1.129 restores the prompt cache TTL to 1 hour — reversing a reduction that had been in place for several months and that many users experienced as a stealth cost increase.

The prompt cache is a mechanism that lets Claude reuse previously processed context instead of re-tokenizing it on every call. When the TTL is longer, cached context survives between separate runs of Claude Code, meaning users working iteratively across a project don't pay to re-process large system prompts or file contents repeatedly.

Why This Matters

The TTL reduction had a compounding effect on cost. Users running Claude Code repeatedly throughout the day — which is the normal workflow for anyone using sub-agents, CI pipelines, or iterative coding sessions — were re-paying for context on each invocation that would otherwise have been cached.

For teams running Claude Code at scale, the difference between a 5-minute cache and a 60-minute cache can be substantial. A developer who iterates on a large codebase across a 2-hour working session might make dozens of Claude calls. With a short TTL, most of those calls start cold. With the restored 1-hour TTL, the first call of each hour pays for the cache fill, and subsequent calls within that window are significantly cheaper.

Context: Why Was the TTL Reduced?

Anthropic temporarily shortened the cache TTL as part of infrastructure adjustments during a period of rapid capacity scaling. The reduction was not announced prominently, which is why many users discovered it only after noticing elevated costs. The restoration in v2.1.129 is a direct response to user feedback and monitoring data showing elevated spend patterns.

Who Is Affected

This change benefits all Claude Code users, but the impact is most pronounced for:

Developers running long coding sessions with large context windows
Teams using sub-agent workflows where multiple agents share a common system prompt
CI/CD integrations that invoke Claude Code repeatedly against the same codebase
Anyone using the 1M-token context window who had been paying full re-processing costs on each call

Practical Impact

No action is required on the user side. The 1-hour TTL is restored automatically in v2.1.129. Users who update will immediately benefit from lower costs on any session that involves multiple Claude Code invocations within a 60-minute window.

Mentioned onHacker News Reddit X (Twitter)Simon Willison's Weblog