Gemini 3.1 Flash-Lite: Google's Fastest Model Reaches General Availability

Gemini CLIMay 7, 2026

Google released gemini-3.1-flash-lite as a generally available model on May 7, 2026, graduating it from the preview it entered on March 3. The model delivers 2.5x faster time-to-first-token and 45% higher throughput than 2.5 Flash-Lite, with a 1M token context window, up to 66K output tokens, and built-in thinking levels for configurable reasoning depth. Pricing is set at $0.25/$1.50 per million tokens. Developers using the companion gemini-3.1-flash-lite-preview model ID must migrate before its May 25, 2026 shutdown.

Sources & Mentions

5 external resources covering this update

Gemini 3.1 Flash-Lite: Built for intelligence at scale

Hacker News

Gemini 3.1 Flash-Lite is now generally available

Google Cloud Blog

Gemini 3.1 Flash-Lite Preview — Intelligence, Performance & Price Analysis

Artificial Analysis

Gemini 3.1 Flash-Lite is the fast help you need if you're a dev with complex data

Android Central

Jeff Dean announces Gemini 3.1 Flash-Lite

X (Twitter)

Gemini 3.1 Flash-Lite Reaches General Availability

Google officially promoted gemini-3.1-flash-lite to general availability on May 7, 2026. The model had been available in preview since March 3, 2026; the GA release signals production readiness with stability guarantees, full enterprise support via Vertex AI, and a fixed deprecation timeline for the preview endpoint.

Performance and Capabilities

Gemini 3.1 Flash-Lite is the fastest and most cost-efficient model in Google's Gemini 3 series, designed for high-volume, latency-sensitive workloads where throughput and cost-per-call are the primary constraints. Compared to the prior-generation 2.5 Flash-Lite:

2.5x faster time to first answer token (TTFT)
45% higher output throughput
GPQA Diamond: 86.9% — competitive with significantly larger models
1M token context window with up to 66K output tokens per request

The model is fully multimodal, supporting text, images, audio, and function calling. Thinking levels are built in natively, allowing developers to configure reasoning depth dynamically: minimal mode for near-instant classifier and routing responses; higher levels for complex tasks such as UI generation, simulation, and multi-step agentic pipelines.

Pricing

Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens. While this represents a substantial performance upgrade over 2.5 Flash-Lite, the pricing is higher than the previous generation ($0.10/$0.40). Community benchmarks have noted that the model's improved instruction-following efficiency — consuming fewer tokens per task — can offset the higher per-token rate in many workloads, though use cases with heavy reasoning settings may see sharply increased costs due to extended thinking token consumption.

Real-World Deployments

Google highlighted several production deployments at GA launch:

Gladly — handling millions of customer-service calls weekly across SMS, WhatsApp, and Instagram, with p95 latency of ~1.8 seconds for full replies and sub-second p95 for classifiers and tool calls; ~60% lower costs versus comparable thinking-tier models
OffDeal — powering real-time data lookups and email triage for AI agents in investment banking workflows
Ramp — deployed for high-volume, latency-sensitive financial features
JetBrains — integrated for code completion and agentic developer tooling in IntelliJ-based IDEs
krea.ai — multimodal safety checks and prompt enhancement at creative scale

Migration: Preview Endpoint Shutdown

The gemini-3.1-flash-lite-preview model ID is being deprecated and will be fully shut down on May 25, 2026. The deprecation timeline:

Date	Event
May 7, 2026	`gemini-3.1-flash-lite` GA release
May 11, 2026	`gemini-3.1-flash-lite-preview` begins deprecation
May 25, 2026	`gemini-3.1-flash-lite-preview` fully shut down

Developers must update all references from gemini-3.1-flash-lite-preview to gemini-3.1-flash-lite before May 25 to avoid service interruption.

Accessing the Model in Gemini CLI

Gemini CLI users can access the GA model immediately — no CLI update required. To use it, pass --model gemini-3.1-flash-lite at the command line, or set the model field in ~/.gemini/settings.json. The model is also available directly via the Gemini API in Google AI Studio and via Vertex AI.

Mentioned onHacker News Google Cloud Blog Artificial Analysis Android Central X (Twitter)