Windsurf: Per-Response Token Breakdown in Cascade Chat

Windsurf

Windsurf added per-response token usage visibility to the Cascade chat panel, giving users a real-time breakdown of input tokens, output tokens, and cached input tokens for every AI response. The release also enhances context window transparency by surfacing when the prompt cache expires, helping developers better understand cache utilization. These additions complement the quota-based billing system Windsurf launched in March 2026, making it easier to track how individual requests consume quota allowances.

Key Takeaways

  • Per-response token breakdown, including input, output, and cached tokens, is now visible directly inside the Cascade chat panel for every AI response.
  • Cached input token visibility helps developers understand how prompt caching reduces their effective quota consumption on repeated context.
  • Context window cache expiry is now surfaced in the IDE, giving developers advance notice before a cache invalidates and costs increase.
  • Windsurf's quota system β€” launched March 19, 2026 β€” provides daily and weekly allowances measured in tokens, making per-response tracking more actionable than ever.
  • Windows terminal keyboard input fix resolves a quality-of-life issue affecting users on Windows who rely on the built-in Cascade terminal.
  • RHEL 8 devcontainer support is restored, addressing an enterprise-relevant regression for Linux users running Red Hat environments.

Sources & Mentions

1 external resource covering this update


Per-Response Token Visibility in Cascade

With version 1.9600.1032, Windsurf introduced token usage tracking directly into the response card of the Cascade chat panel. Developers can now see a granular breakdown of exactly how many input tokens, output tokens, and cached input tokens each AI response consumed β€” displayed inline without needing to navigate away from the editor.

This level of transparency is particularly valuable in the context of Windsurf's quota-based billing system, which replaced the earlier credit model in March 2026. Under the quota system, each plan includes daily and weekly usage allowances tied to token consumption. Prior to this update, users had limited visibility into how individual Cascade interactions consumed their quota β€” they could check aggregate usage through the settings panel or account page, but lacked per-response granularity.

Context Window Cache Expiry

The same update enhances the context window indicator to display when the prompt cache expires. Windsurf's Cascade leverages prompt caching to reduce token costs on repeated context β€” when the cache is active, cache-read tokens cost significantly less than standard input tokens. By surfacing cache expiry directly in the IDE, developers can now anticipate when a new cache will need to be built and plan their sessions accordingly to minimize token spend.

Bug Fixes

Version 1.9600.1032 also addressed several smaller issues:

  • Fixed keyboard input problems affecting the built-in terminal on Windows
  • Improved cost sorting for token-based plans in the model picker
  • Clarified which models require a Pro plan subscription
  • Repaired devcontainer support on RHEL 8

Mentioned onWindsurf Docs
Windsurf Cascade: Per-Response Token Tracking in Chat | Yet Another Changelog