Google Releases Gemini Embedding 2: One Model for All Modalities
Google DeepMind released Gemini Embedding 2 (GE2), its first native multimodal embedding model that handles text, audio, video, and image inputs within a single unified representation. GE2 tops benchmarks for image retrieval, video search, multilingual text, and code retrieval, and generalizes to untrained niche domains. The model is available immediately via the Gemini API and Vertex AI, enabling cross-modal retrieval — for example, querying with an image to retrieve relevant video content.
Why It Matters
A single embedding model spanning all major modalities eliminates the need for separate embedding pipelines per content type, which is directly relevant to any system doing cross-modal RAG or knowledge base work at production scale.