Gemini 3.1 Flash Live: Real-Time Audio Model for Developers

Gemini CLI

Google launched Gemini 3.1 Flash Live (gemini-3.1-flash-live-preview) on March 26, 2026 β€” the latest audio-to-audio model designed for real-time voice and vision applications via the Live API. The model delivers lower latency, improved noise filtering, and significantly better instruction adherence compared to the previous 2.5 Flash Native Audio, while supporting more than 90 languages for multi-modal real-time conversations. Developers can tune the quality-latency tradeoff by setting a thinkingLevel parameter (minimal, low, medium, high), with the minimal setting achieving sub-second response times. Pricing is available at $0.35/hour for audio input and $1.40/hour for audio output, matching Gemini 2.5 rates.


A New Real-Time Audio Model for the Live API

Google launched Gemini 3.1 Flash Live (model ID: gemini-3.1-flash-live-preview) on March 26, 2026 β€” the latest addition to the Gemini 3.1 family and the first model in this generation specifically designed for real-time, low-latency voice and vision applications through the Live API.

The model is a direct successor to Gemini 2.5 Flash Native Audio, and Google reports meaningful improvements across every dimension developers care about in conversational AI: lower end-to-end latency, better filtering of background noise and interruptions, and substantially improved adherence to system instructions and persona prompts. Multi-modal support is retained, allowing simultaneous audio and video input for real-time vision-based interactions.

Tunable Thinking Levels

One of the most developer-friendly additions in Gemini 3.1 Flash Live is the thinkingLevel parameter. Developers can now explicitly set how much reasoning the model applies before responding, choosing from four levels: minimal, low, medium, and high.

The minimal setting is optimized for pure speed β€” Google says it achieves sub-second response times, making it suitable for latency-sensitive applications such as voice assistants, real-time translators, and interactive agents. Higher thinking levels trade some latency for improved accuracy and coherence in complex, multi-turn conversations.

Broad Language Support

Gemini 3.1 Flash Live supports more than 90 languages for both audio input and output. This makes it one of the widest-coverage real-time speech models available through a public API, opening the door for multilingual voice applications without requiring separate localization models.

Pricing

Pricing for Gemini 3.1 Flash Live matches the Gemini 2.5 Flash rates:

  • Audio input: $0.35 per hour
  • Audio output: $1.40 per hour

The model is available in paid preview through the Gemini API. Free-tier access is not available at launch.