Gemini CLI: Real-Time Voice Mode

Gemini CLI

Gemini CLI v0.41.0 introduces Real-Time Voice Mode, enabling developers to interact with the CLI using spoken language rather than typed commands. The feature leverages the Gemini Live API to stream audio input directly from the microphone and receive spoken responses, making hands-free terminal workflows possible for the first time. Voice Mode supports full tool use, meaning spoken requests can trigger file reads, shell commands, and MCP tool calls exactly as typed prompts do. Google also shipped companion improvements to audio output quality and wake-word detection in this release.

Featured Video

A video we selected to help illustrate this changelog


Real-Time Voice Mode Comes to Gemini CLI

Gemini CLI v0.41.0 marks a significant interface milestone: developers can now speak to the CLI instead of typing. The new Real-Time Voice Mode integrates the Gemini Live API directly into the terminal environment, streaming microphone audio to Gemini's models and playing spoken responses back through the system audio output.

How Voice Mode Works

Voice Mode is activated with the /voice command or the --voice flag at startup. Once active, Gemini CLI continuously listens for speech, processes it using Gemini's streaming audio pipeline, and responds audibly. The session remains interactive β€” users can interrupt mid-response, ask follow-up questions, or switch back to text input at any point.

Internally, Voice Mode uses the same agent loop as text-based interactions. This means spoken requests can invoke any tool available in the current session: reading files, running shell commands, calling MCP servers, or triggering Plan Mode tasks. A developer can say "show me the last ten lines of server.log" and hear the output read back, or ask "what functions are exported from utils.ts" and receive a spoken summary.

Audio Quality and Wake-Word Detection

Alongside the core Voice Mode feature, Google shipped improvements to the audio output pipeline. Text-to-speech responses now use a higher-fidelity voice model with more natural prosody and reduced robotic artefacts. The update also introduces optional wake-word detection β€” when enabled, Gemini CLI listens passively and activates only when it hears a configurable trigger phrase, preventing accidental activation during normal conversation.

These improvements make Voice Mode practical for extended sessions, particularly in contexts where hands-free operation is valuable: pair programming out loud, reviewing code while pacing, or narrating a debugging session for recording purposes.

Tool Use and Plan Mode Compatibility

One of the most notable aspects of Voice Mode is its full compatibility with Gemini CLI's existing tool ecosystem. Spoken prompts pass through the same intent-parsing and tool-dispatch pipeline as text prompts, so Voice Mode does not impose any capability restrictions. Plan Mode, MCP tool calls, file system operations, and shell execution all work as expected.

Google notes that Voice Mode requires a microphone-enabled environment and an active internet connection for audio streaming. Offline or air-gapped environments are not supported in this release.

Configuration

Voice Mode settings are configurable via ~/.gemini/settings.json:

  • voiceMode.enabled β€” toggle Voice Mode at startup
  • voiceMode.wakeWord β€” set a custom wake phrase (default: disabled)
  • voiceMode.outputVolume β€” control TTS output volume (0.0–1.0)
  • voiceMode.language β€” set the preferred speech recognition locale

The /voice command also accepts inline flags for session-level overrides without modifying the config file.