Gemini Omni Flash Enters Public Preview for Developers

Gemini CLIView original changelog

Google released gemini-omni-flash-preview to public preview through the Gemini API, bringing native video generation and conversational video editing to developers for the first time. The model generates 3-10 second, 720p video clips from text prompts or animates still images, and lets developers refine outputs turn-by-turn using natural language via the Interactions API. Pricing is set at $0.10 per second of generated video, matching Veo 3.1 Fast, positioning Omni Flash as a lower-cost, faster-iteration alternative for video workflows.

Featured Video

A video we selected to help illustrate this changelog

Key Takeaways

  • The YouTube demo for developers frames Omni Flash primarily as an API-first release, walking through generating a clip from a text prompt and then editing it conversationally in the same session.
  • The demo highlights that continuity is preserved across edits: the model remembers prior scene state instead of regenerating from a blank slate each time.
  • Developer reactions in the demo focus on the $0.10/second pricing as competitive with Veo 3.1 Fast, but note the 720p ceiling as a tradeoff for speed and cost.
  • The video calls out that audio-reference uploads are not yet available in the API version, a gap developers should plan around during the preview period.
  • Conversational editing is the standout differentiator versus prior text-to-video tools, compressing the generate, review, regenerate loop into a single chat-style interaction.
  • Omni Flash marks Google's first entry in a broader "Omni" model family aimed at reasoning across mixed image, audio, video, and text inputs rather than treating them as separate pipelines.

A New Kind of Video Model Ships to the API

Google shipped the first model in its "Omni" family to the Gemini API on June 30, 2026: gemini-omni-flash-preview. Unlike prior text-to-video releases, Omni Flash is built around conversational editing: instead of regenerating a clip from scratch when something needs to change, developers can describe an edit in natural language and have the model refine the existing video while preserving continuity.

Core Capabilities

Omni Flash accepts a mix of text, image, and video inputs and reasons across all of them rather than simply stitching outputs together. Using the Interactions API, developers can:

  • Generate 3-10 second videos at 720p from a text description
  • Animate a still image into a short clip
  • Conversationally edit and refine a previously generated video, adjusting elements like motion, framing, or scene details without starting over

Pricing and Positioning

Omni Flash is priced at $0.10 per second of video output, the same rate as Veo 3.1 Fast, making it a direct, lower-friction alternative for teams that want faster iteration over raw output quality or resolution. Notably, Omni Flash tops out at 720p, while Google's Veo tiers scale up to 4K, so it is being positioned as a rapid-prototyping and chat-driven editing tool rather than a final-render pipeline.

Known Limitations

As a public preview, Omni Flash ships with acknowledged gaps: audio-reference uploads aren't yet supported through the API, character consistency can drift across scene changes, and video-to-video referencing isn't fully functional yet. Google has published a dedicated Gemini Omni Flash guide and model card for developers getting started.