Gemini API: File Search Now Supports Multimodal Retrieval

Gemini CLIMay 5, 2026

The Gemini API's File Search tool now supports multimodal retrieval, allowing developers to search across text, images, audio, and video files stored in the Files API using a single natural-language query. Previously limited to text-only retrieval, the updated File Search tool leverages Gemini's multimodal embedding models to index and retrieve content across modalities simultaneously. The update eliminates the need to maintain separate search indices for different file types, simplifying RAG pipelines that work with mixed-media document collections. Google is rolling out multimodal File Search in preview via the Gemini API and Vertex AI.

Sources & Mentions

5 external resources covering this update

File Search — Gemini API Documentation

Gemini API Docs

Gemini API File Search now supports images, audio, and video

Hacker News

Google expands Gemini API File Search to images, audio, and video

VentureBeat

Gemini API File Search now works across text, images, audio, video

Building multimodal RAG with Gemini File Search

Dev.to

File Search Gains Multimodal Retrieval Capabilities

The Gemini API's built-in File Search tool has been significantly expanded in a May 2026 update. File Search, which allows Gemini models to search through developer-uploaded files using natural language, now supports multimodal retrieval — meaning a single query can surface relevant results from text documents, images, audio recordings, and video files simultaneously.

What Changed

Until this update, File Search operated exclusively on text content. Images, audio, and video files uploaded via the Files API were accessible to Gemini models for direct analysis but could not be indexed for semantic search. Developers building retrieval-augmented generation (RAG) pipelines that included mixed-media content had to maintain separate retrieval systems for non-text assets.

The updated File Search tool uses Gemini's multimodal embedding models — the same technology underpinning Gemini Embedding 2 — to generate unified vector representations of content regardless of its modality. A query like "slides about transformer architecture from last quarter" can now retrieve a PDF presentation, an image of a whiteboard diagram, and a video recording of a lecture, ranked by semantic relevance across all three.

How It Works

Developers enable multimodal File Search by setting the file_search tool parameter in their API request. Files uploaded to the Files API are automatically indexed when the tool is active. The indexing process extracts semantic embeddings from text, image frames (for video), and audio transcriptions, storing them in a unified retrieval index scoped to the developer's project.

At query time, the model issues a search against this index and retrieves the top-k most relevant file segments, which are then passed as context into the generation turn. The number of retrieved segments and relevance thresholds are configurable via the file_search_config parameter.

Supported File Types

The following file types are supported for multimodal indexing in the preview release:

Text: PDF, TXT, HTML, Markdown, DOCX
Images: JPEG, PNG, GIF, WEBP, HEIC
Audio: MP3, WAV, FLAC, AAC, OGG
Video: MP4, MOV, AVI, WEBM (frame-level indexing at 1 fps)

Impact on RAG Pipelines

For developers building knowledge bases or document Q&A systems, multimodal File Search substantially simplifies the architecture. A single tool call now replaces what previously required orchestrating a text retrieval system, a separate image search index, and potentially a speech-to-text preprocessing step for audio content.

Google notes that multimodal retrieval works particularly well for enterprise use cases involving mixed-format documentation: technical manuals with embedded diagrams, recorded meeting libraries, training datasets with annotated images, and product catalogues that combine descriptions with photos.

Availability

Multimodal File Search is available in preview via the Gemini API for developers on paid tiers. It is also available through Vertex AI for enterprise customers. Pricing follows the standard Files API storage rates plus a retrieval cost per query; the exact per-query pricing for multimodal retrieval was not announced at preview launch.

The feature is accessible using the standard generateContent endpoint with the file_search tool included in the tools array — no SDK upgrade is required for developers already using Gemini API Python or Node.js SDKs on version 1.0 or later.

Mentioned onGemini API Docs Hacker News VentureBeat Reddit Dev.to