Gemini CLI: Automatic Context Compression Service

Gemini CLIApr 14, 2026

Gemini CLI v0.38.0 introduces a Context Compression Service that automatically summarizes long conversation histories when context usage crosses a configurable threshold, preventing sessions from hitting token limits mid-task. The service compresses ephemeral conversation content down to 5–15% of its original length while leaving GEMINI.md system context and /memory add entries untouched. Developers can tune the trigger via the new experimental.generalistProfile setting, and the existing /compress slash command continues to work for manual control.

Sources & Mentions

5 external resources covering this update

Context Compression Service PR #24483

GitHub

Gemini CLI Tutorial Series Part 9: Context, Memory and Conversational Branching

Medium

Conversation Compression — Gemini CLI Tips

DeepWiki

A Look at Context Engineering in Gemini CLI

Substack

Feature Request: Agent-initiated context compression #19456

GitHub

What Is the Context Compression Service?

Every Gemini CLI session builds a growing conversation history — tool calls, file reads, shell output, intermediate reasoning. Over time, that history consumes an increasing share of the model's context window, and sessions that run long enough eventually hit token limits and stall mid-task. Gemini CLI v0.38.0 addresses this directly with a new Context Compression Service: an automatic background mechanism that detects when context usage is climbing toward the limit and triggers a summarization pass before the session hits a wall.

The compression trigger is configurable via the new experimental.generalistProfile setting. When context usage crosses the configured threshold, the service invokes a summarization sub-agent that rewrites the conversation history into a compact representation — capturing the essential facts, decisions, and intermediate results without retaining the full verbatim exchange.

What Gets Compressed — and What Doesn't

The compression pass targets ephemeral conversation content: tool call sequences, raw file output, and multi-turn back-and-forth dialogue that has already served its purpose. This is the material that tends to dominate context budgets in long agent sessions but carries little forward value once its outcomes are captured.

Critically, two categories of content are explicitly excluded from compression:

GEMINI.md system context — project instructions, conventions, and persistent configuration defined in GEMINI.md files are left untouched. The compression service recognizes these as authoritative reference material that must remain verbatim.
/memory add entries — anything the user has explicitly committed to memory via the /memory add command is similarly preserved. These are intentional, user-curated facts, not ephemeral dialogue.

The result is a compressed context that retains the session's institutional knowledge while shedding the conversational bulk. According to the implementation, the service targets a compression ratio of 5–15% of the original content length for the ephemeral portions — a significant reduction that can extend a session's effective lifespan considerably.

Manual Control: The /compress Command

For developers who prefer explicit control, the existing /compress slash command continues to work exactly as before. Invoking /compress triggers an immediate on-demand summarization pass, independent of the automatic threshold. This is useful when approaching a known context boundary before starting a complex sub-task, or when the automatic trigger hasn't fired yet but a manual compaction is desirable.

The automatic service and the manual command coexist: the service fires when the threshold is crossed, while /compress fires on demand. Both use the same underlying summarization mechanism.

Practical Impact for Long Agent Sessions

The Context Compression Service is most impactful in two categories of usage:

Long-running agent tasks — multi-step operations that span dozens of tool calls (code analysis, iterative refactoring, large file processing) previously required manual intervention or session restarts when context grew too large. With automatic compression, these sessions can continue without interruption.

Background and unattended tasks — sessions running overnight or in CI-like contexts benefit particularly, since there is no human present to notice a context warning and manually intervene.

Known Considerations

The compression is explicitly lossy by design. The summarization process captures the essential outcomes and decisions but does not preserve the full verbatim history of every tool call and exchange. Developers who need to audit the full sequence of operations in a session should rely on session logs rather than the in-context history after compression has occurred.

The experimental.generalistProfile setting that governs the trigger threshold is — as the name implies — experimental. Behavior may change in future releases as Google tunes the heuristics based on real-world usage patterns.

Mentioned onGitHub Medium DeepWiki Substack GitHub