Google Releases Gemma 4 12B: Encoder-Free Multimodal

Google has released Gemma 4 12B under Apache 2.0 — a unified multimodal model with no separate encoder for any modality. Text, image, audio, and video all flow through a single lightweight projection into the shared token space. At 256K context with native tool calling and agentic reasoning, it fits in 16GB VRAM and has been demonstrated running on a 10-year-old Xeon CPU under LM Studio and Ollama.

Why It Matters

Encoder-free architecture is what makes local agentic multimodal reasoning practical at laptop scale — directly lowering the hardware floor for on-device, privacy-preserving AI agent deployments without a GPU budget.