Gemini 3.1 Flash Live: Real-Time Audio Model for Developers
Google launched Gemini 3.1 Flash Live (gemini-3.1-flash-live-preview) on March 26, 2026 β the latest audio-to-audio model designed for real-time voice and vision applications via the Live API. The model delivers lower latency, improved noise filtering, and significantly better instruction adherence compared to the previous 2.5 Flash Native Audio, while supporting more than 90 languages for multi-modal real-time conversations. Developers can tune the quality-latency tradeoff by setting a thinkingLevel parameter (minimal, low, medium, high), with the minimal setting achieving sub-second response times. Pricing is available at $0.35/hour for audio input and $1.40/hour for audio output, matching Gemini 2.5 rates.
Sources & Mentions
5 external resources covering this update
Gemini 3.1 Flash Live goes live for developers
The Decoder
Google releases Gemini 3.1 Flash Live, a new real-time audio AI model
9to5Google
Gemini 3.1 Flash Live: Real-time audio with thinking
Dev.to
Google Gemini 3.1 Flash Live API launched for developers
Android Authority
Google Releases Gemini 3.1 Flash Live Model
Search Engine Journal
A New Real-Time Audio Model for the Live API
Google launched Gemini 3.1 Flash Live (model ID: gemini-3.1-flash-live-preview) on March 26, 2026 β the latest addition to the Gemini 3.1 family and the first model in this generation specifically designed for real-time, low-latency voice and vision applications through the Live API.
The model is a direct successor to Gemini 2.5 Flash Native Audio, and Google reports meaningful improvements across every dimension developers care about in conversational AI: lower end-to-end latency, better filtering of background noise and interruptions, and substantially improved adherence to system instructions and persona prompts. Multi-modal support is retained, allowing simultaneous audio and video input for real-time vision-based interactions.
Tunable Thinking Levels
One of the most developer-friendly additions in Gemini 3.1 Flash Live is the thinkingLevel parameter. Developers can now explicitly set how much reasoning the model applies before responding, choosing from four levels: minimal, low, medium, and high.
The minimal setting is optimized for pure speed β Google says it achieves sub-second response times, making it suitable for latency-sensitive applications such as voice assistants, real-time translators, and interactive agents. Higher thinking levels trade some latency for improved accuracy and coherence in complex, multi-turn conversations.
Broad Language Support
Gemini 3.1 Flash Live supports more than 90 languages for both audio input and output. This makes it one of the widest-coverage real-time speech models available through a public API, opening the door for multilingual voice applications without requiring separate localization models.
Pricing
Pricing for Gemini 3.1 Flash Live matches the Gemini 2.5 Flash rates:
- Audio input: $0.35 per hour
- Audio output: $1.40 per hour
The model is available in paid preview through the Gemini API. Free-tier access is not available at launch.