Claude Code /cost: Per-Model and Cache-Hit Breakdown
Claude Code v2.1.92 introduces a per-model and cache-hit breakdown to the /cost command, giving subscription users granular visibility into exactly how many tokens each model consumed and how much of that cost was served from the prompt cache. Previously, the /cost command showed only aggregate totals, leaving users without a clear picture of which models drove spending in multi-model sessions. This change is particularly valuable as Claude Code increasingly supports parallel subagent workflows where multiple models may run in a single session.
Per-Model Cost Breakdown
Claude Code v2.1.92 introduces a significant upgrade to the /cost command, adding per-model and cache-hit attribution that gives subscription users a detailed breakdown of exactly how tokens were consumed in each session.
Previously, /cost displayed only a single aggregate figure β total input tokens, total output tokens, and an estimated cost. While useful as a rough guide, this aggregate view fell short for power users running multi-model sessions. When Opus and Sonnet both contribute to a session, the old view provided no way to determine which model drove the bulk of spending.
What the New Breakdown Shows
With v2.1.92, invoking /cost now surfaces a structured table that breaks spending down by model. For each model active in the session, users can see:
- Input tokens consumed
- Output tokens generated
- Cache-hit tokens (tokens served from the prompt cache)
- Cache-miss tokens (tokens processed fresh)
This granularity transforms /cost from a passive summary into an actionable diagnostic tool.
Prompt Cache Visibility
The cache-hit data surfaces a dimension of cost that was previously invisible. Prompt caching is one of the most effective ways to reduce API costs in long or repetitive sessions β but until now, users had no direct way to verify whether their session structure was actually benefiting from it.
With the new breakdown, users can immediately see the ratio of cache hits to cache misses for each model. A session with poor cache utilization will show a disproportionately high share of cache-miss tokens, signaling that the session could be restructured to improve efficiency.
Cache Expiry Hint for Pro Users
Pro subscription users returning to an existing session now also receive a cache expiry hint. When resuming a session after the prompt cache has partially or fully expired, Claude Code surfaces an estimate of how many tokens the next turn will send uncached β giving users a heads-up before they incur the cost of a full re-cache.
Multi-Agent Attribution
The per-model breakdown is especially well-timed given Claude Code's expanding support for parallel subagent workflows. In an orchestrated session where a primary Opus agent spawns multiple Sonnet subagents, the old aggregate cost view made it impossible to understand the cost composition.
With v2.1.92, each model's contribution is individually attributed. Users running mixed-model setups can immediately see whether, for example, 80% of session cost came from the orchestrator or from the subagents β and adjust their architecture accordingly.
No Configuration Required
The expanded /cost output is automatic for all subscription users on v2.1.92 and later. No new flags, no settings to toggle β the richer breakdown appears by default whenever /cost is invoked.