Gemini CLI: Run Gemma Locally with One Command

Gemini CLI

Gemini CLI v0.40.0 introduces the gemini gemma command suite, replacing a previously manual six-step process with a single gemini gemma setup command that downloads the LiteRT-LM binary, pulls a Gemma model, configures settings, and starts the local inference server automatically. Once running, Gemma acts as a local routing model β€” making task-dispatch decisions entirely on-device, without sending any data to Google's cloud. A new /gemma slash command inside sessions shows the routing status at a glance. The feature is experimental and supports macOS, Windows, and Linux.


Gemini CLI Gets a One-Command Local Model Setup

Gemini CLI v0.40.0, released April 28, 2026, ships a new gemini gemma command suite that makes running a local Gemma model genuinely accessible for the first time. Previously, setting up local model routing required manually downloading the LiteRT-LM binary, configuring paths, starting the server, and wiring the routing settings β€” a six-step process that was largely undocumented and fragile. The new command collapses all of that into a single invocation.

The gemini gemma Command Suite

Running gemini gemma setup automates the entire setup sequence: it downloads the LiteRT-LM binary to ~/.gemini/bin/litert/, pulls the Gemma model weights, writes the necessary configuration, and starts the server. After that point, gemini gemma start and gemini gemma stop control the server lifecycle, gemini gemma status provides health diagnostics, and gemini gemma logs streams live or historical server output.

What Local Routing Actually Does

When the local Gemma model is active, Gemini CLI uses it to decide which hosted model to route each request to β€” rather than making that routing decision via a cloud call. This means the dispatch layer runs entirely on-device. For tasks the routing layer handles, no data leaves the machine. The primary use case is developers who work in environments with strict data-residency requirements, intermittent internet access, or a preference for fully offline operation.

Session Visibility via /gemma

A new /gemma slash command displays the current routing configuration and server health directly inside a CLI session, so developers can verify local routing is active without leaving the terminal or inspecting config files.

Configuration and Security Design

The feature uses a split-settings approach: workspace-scoped settings control whether local routing is enabled (safe to commit to a project repo), while user-scoped settings govern the binary paths (which remain user-restricted for security). Auto-start at CLI launch is supported but disabled by default. The feature is cross-platform and has been tested on macOS, Windows, and Linux.

What This Means for Gemini CLI Users

Local model routing has been a long-standing community request β€” the GitHub discussion tracking the feature (#5945) dates back to mid-2025. The gemini gemma command makes this viable for developers who need to process sensitive codebases without cloud API calls, or who want predictable costs by reducing the number of routing-related API round-trips. The feature remains experimental in v0.40.0, but the one-command setup substantially lowers the barrier compared to any prior approach.