Google Gemma 4 E2B/E4B Enables Agent Skills on Edge Devices via LiteRT-LM
Google's Gemma 4 E2B and E4B are the first models to bundle function calling and thinking within a 2–4B effective-parameter footprint. Presented at AI Engineer by Google AI Edge tech lead Cormac Brick, the models run agent skills on Android, iOS, macOS, Linux, Windows, and IoT via the open-source LiteRT-LM runtime, using progressive-disclosure skill loading to preserve reasoning quality in constrained contexts. Apache 2.0 licensed; companion app Google AI Edge Gallery is open source.
Why It Matters
On-device agentic AI just crossed a meaningful capability threshold. The 2–4B range now supports real skill workflows — not just summarization — enabling privacy-preserving, latency-free agent tasks without cloud calls. NPU acceleration on Qualcomm delivers ~10x CPU throughput.