Google Releases Gemini Embedding 2: One Model for All Modalities

Google DeepMind released Gemini Embedding 2 (GE2), its first native multimodal embedding model handling text, audio, video, and image in a unified representation — topping benchmarks for image retrieval, video search, multilingual text, and code. Available now on Gemini API and Vertex AI.

1 min read|agenticonsult Intelligence

Google Releases Gemini Embedding 2: One Model for All Modalities

Google DeepMind released Gemini Embedding 2 (GE2), its first native multimodal embedding model that handles text, audio, video, and image inputs within a single unified representation. GE2 tops benchmarks for image retrieval, video search, multilingual text, and code retrieval, and generalizes to untrained niche domains. The model is available immediately via the Gemini API and Vertex AI, enabling cross-modal retrieval — for example, querying with an image to retrieve relevant video content.

Why It Matters

A single embedding model spanning all major modalities eliminates the need for separate embedding pipelines per content type, which is directly relevant to any system doing cross-modal RAG or knowledge base work at production scale.

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.