Gemini 3.1 Flash-Lite: Google's Fastest Model Reaches General Availability
Google released gemini-3.1-flash-lite as a generally available model on May 7, 2026, graduating it from the preview it entered on March 3. The model delivers 2.5x faster time-to-first-token and 45% higher throughput than 2.5 Flash-Lite, with a 1M token context window, up to 66K output tokens, and built-in thinking levels for configurable reasoning depth. Pricing is set at $0.25/$1.50 per million tokens. Developers using the companion gemini-3.1-flash-lite-preview model ID must migrate before its May 25, 2026 shutdown.
Sources & Mentions
5 external resources covering this update
Gemini 3.1 Flash-Lite: Built for intelligence at scale
Hacker News
Gemini 3.1 Flash-Lite is now generally available
Google Cloud Blog
Gemini 3.1 Flash-Lite Preview β Intelligence, Performance & Price Analysis
Artificial Analysis
Gemini 3.1 Flash-Lite is the fast help you need if you're a dev with complex data
Android Central
Jeff Dean announces Gemini 3.1 Flash-Lite
X (Twitter)
Gemini 3.1 Flash-Lite Reaches General Availability
Google officially promoted gemini-3.1-flash-lite to general availability on May 7, 2026. The model had been available in preview since March 3, 2026; the GA release signals production readiness with stability guarantees, full enterprise support via Vertex AI, and a fixed deprecation timeline for the preview endpoint.
Performance and Capabilities
Gemini 3.1 Flash-Lite is the fastest and most cost-efficient model in Google's Gemini 3 series, designed for high-volume, latency-sensitive workloads where throughput and cost-per-call are the primary constraints. Compared to the prior-generation 2.5 Flash-Lite:
- 2.5x faster time to first answer token (TTFT)
- 45% higher output throughput
- GPQA Diamond: 86.9% β competitive with significantly larger models
- 1M token context window with up to 66K output tokens per request
The model is fully multimodal, supporting text, images, audio, and function calling. Thinking levels are built in natively, allowing developers to configure reasoning depth dynamically: minimal mode for near-instant classifier and routing responses; higher levels for complex tasks such as UI generation, simulation, and multi-step agentic pipelines.
Pricing
Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens. While this represents a substantial performance upgrade over 2.5 Flash-Lite, the pricing is higher than the previous generation ($0.10/$0.40). Community benchmarks have noted that the model's improved instruction-following efficiency β consuming fewer tokens per task β can offset the higher per-token rate in many workloads, though use cases with heavy reasoning settings may see sharply increased costs due to extended thinking token consumption.
Real-World Deployments
Google highlighted several production deployments at GA launch:
- Gladly β handling millions of customer-service calls weekly across SMS, WhatsApp, and Instagram, with p95 latency of ~1.8 seconds for full replies and sub-second p95 for classifiers and tool calls; ~60% lower costs versus comparable thinking-tier models
- OffDeal β powering real-time data lookups and email triage for AI agents in investment banking workflows
- Ramp β deployed for high-volume, latency-sensitive financial features
- JetBrains β integrated for code completion and agentic developer tooling in IntelliJ-based IDEs
- krea.ai β multimodal safety checks and prompt enhancement at creative scale
Migration: Preview Endpoint Shutdown
The gemini-3.1-flash-lite-preview model ID is being deprecated and will be fully shut down on May 25, 2026. The deprecation timeline:
| Date | Event |
|---|---|
| May 7, 2026 | gemini-3.1-flash-lite GA release |
| May 11, 2026 | gemini-3.1-flash-lite-preview begins deprecation |
| May 25, 2026 | gemini-3.1-flash-lite-preview fully shut down |
Developers must update all references from gemini-3.1-flash-lite-preview to gemini-3.1-flash-lite before May 25 to avoid service interruption.
Accessing the Model in Gemini CLI
Gemini CLI users can access the GA model immediately β no CLI update required. To use it, pass --model gemini-3.1-flash-lite at the command line, or set the model field in ~/.gemini/settings.json. The model is also available directly via the Gemini API in Google AI Studio and via Vertex AI.