Gemini 2.0 Flash & Flash-Lite Reach End of Life: API Calls Now Fail

Gemini CLIJun 1, 2026

Google shut down four Gemini 2.0 model endpoints on June 1, 2026: gemini-2.0-flash, gemini-2.0-flash-001, gemini-2.0-flash-lite, and gemini-2.0-flash-lite-001. Any application still calling these identifiers now receives an API error rather than a response. Google had signaled this deprecation since February 18, 2026, but the shutdown marks the moment where previously functioning integrations break without intervention. The recommended migration paths are gemini-3.5-flash for standard workloads and gemini-3.1-flash-lite for cost-sensitive applications that need a price-equivalent replacement.

Sources & Mentions

4 external resources covering this update

Gemini 2.0 Flash discontinuation date discussion

Google AI Developers Forum

Remove deprecated Gemini 2.0 models from defaultModels (LibreChat #12444)

GitHub

Gemini API deprecations page

Gemini API Docs

Release notes changelog

Gemini API Docs

End of Life for Gemini 2.0 Flash and Flash-Lite

As of June 1, 2026, Google has shut down four Gemini 2.0 API model endpoints. Developers who have not yet migrated their applications will now see API errors where they previously received model responses. This is not a gradual deprecation with fallback routing: calls to the retired identifiers return hard errors immediately.

Which Models Were Shut Down

The following four model IDs are now permanently offline across the Gemini API and Vertex AI:

gemini-2.0-flash
gemini-2.0-flash-001
gemini-2.0-flash-lite
gemini-2.0-flash-lite-001

Google first announced the deprecation of these models on February 18, 2026, giving developers roughly three and a half months to plan and execute the migration. The June 1 date was confirmed as the shutdown deadline, ending a period during which the endpoints continued to function despite being officially deprecated.

Why This Matters for Gemini CLI Users

Gemini CLI users who hard-coded model IDs in their configuration files, Agent Skills, or extensions pointing to any of the 2.0 Flash variants will encounter immediate failures when invoking those configurations. The same applies to any MCP server integrations, A2A remote agents, or headless scripts that specify a 2.0 Flash model explicitly.

Users relying on the Auto routing mode are unaffected, since Auto mode dynamically selects from active model endpoints and has already routed traffic away from the retired identifiers.

Migration Paths

The appropriate replacement depends on the use case and budget:

For gemini-2.0-flash users: The recommended replacement is gemini-3.5-flash, Google's current frontier model for agentic and coding tasks. However, this comes with a significant cost increase: Gemini 2.0 Flash was priced at $0.10 per million input tokens and $0.40 per million output tokens, while Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens. For teams running high-volume pipelines, gemini-3.1-flash-lite ($0.25 input / $1.50 output) or gemini-2.5-flash-lite ($0.10 input / $0.40 output) may be the more practical upgrade with no price change.

For gemini-2.0-flash-lite users: The direct equivalent is gemini-3.1-flash-lite, which reached general availability on May 7, 2026. It matches the $0.25/$1.50 pricing tier, delivers 2.5x faster time-to-first-token, and improves output quality on most classification and extraction workloads. For teams that need to stay at the exact $0.10/$0.40 price point, gemini-2.5-flash-lite is now the drop-in option.

The Migration Is Mostly a Name Swap

For straightforward integrations, updating the model identifier string in the application configuration or SDK call is typically all that is required. The Gemini API maintains a consistent request/response schema across model versions, so code changes beyond the model name are generally unnecessary unless a feature specific to 2.0 Flash was being used.

One caution for teams migrating to gemini-2.5-flash: the model supports a configurable thinking mode that can significantly inflate token costs if left unmanaged. Setting thinkingBudget: 0 disables thinking mode and keeps costs predictable for teams migrating from latency- or cost-sensitive 2.0 Flash workloads.

What Comes Next

Google has not announced any additional near-term model shutdowns following the June 1 retirement of the 2.0 Flash family. The current lineup of stable, production-recommended endpoints includes gemini-3.5-flash, gemini-3.1-flash-lite, gemini-3.1-pro, and the 2.5 Flash series.

Mentioned onGoogle AI Developers Forum GitHub Gemini API Docs Gemini API Docs