xAI Grok 4.3 and Mistral Medium 3.5 Underwhelm Independent Benchmarks

Independent evaluation on Artificial Analysis shows xAI Grok 4.3 beta trailing Kimi K 2.6, MiMu (Xiaomi), and the closed model leaders (GPT 5.5 and others) on leaderboard performance, despite marketing focused on agentic-tool improvements. Mistral Medium 3.5 — a 128B dense model with 256K context — scores significantly below DeepSeek V4 on the same benchmarks while being priced higher than comparable open models. Reviewers recommend Mistral Medium 3.5 only for EU regulatory compliance use cases where European-origin models are required.

Why It Matters

Two high-profile releases underperforming self-reported benchmarks reinforces the need for independent evaluation before procurement decisions — a pattern becoming routine in 2026 model releases.