Prism ML Ships Ternary Flux 2 Klein 4B: 7.7GB Collapsed to 1.2GB
Prism ML released ternary (1.58-bit) and binary (1-bit) quantizations of Black Forest Labs' Flux 2 Klein 4B image model — collapsing the 7.7GB checkpoint to approximately 1.2GB (ternary) and below 1GB (binary). Prism ML claims 88–95% benchmark retention. Independent testing by Tim Carambat (Anything LLM) on an M4 Pro found acceptable results on artistic prompts but significant degradation on text rendering, product mockups, and photo-realistic scenes. Distributed in MLX (Apple Silicon) and Gemite (CUDA/Windows/Linux) runtimes.
Why It Matters
Image model quantization at LLM-equivalent compression ratios is a first. But the real-world quality gap — benchmarks show 95% retention; production text and structural prompts fail — highlights a fundamental difference between LLM and image model quantization maturity. The "iPhone-deployable image model" milestone is claimed, not yet delivered in practice.