Google Releases Gemma 4 12B: Encoder-Free Multimodal

Google releases Gemma 4 12B under Apache 2.0: unified encoder-free multimodal model with 256K context, native audio/image/video support, agentic reasoning — runs in 16GB VRAM with Day 0 Transformers, llama.cpp, and MLX support.

1 min read|agenticonsult Intelligence

Google Releases Gemma 4 12B: Encoder-Free Multimodal

Google has released Gemma 4 12B under Apache 2.0 — a unified multimodal model with no separate encoder for any modality. Text, image, audio, and video all flow through a single lightweight projection into the shared token space. At 256K context with native tool calling and agentic reasoning, it fits in 16GB VRAM and has been demonstrated running on a 10-year-old Xeon CPU under LM Studio and Ollama.

Why It Matters

Encoder-free architecture is what makes local agentic multimodal reasoning practical at laptop scale — directly lowering the hardware floor for on-device, privacy-preserving AI agent deployments without a GPU budget.

Primary source

Google

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.