Google DeepMind AI Co-Mathematician Hits 48% on FrontierMath Tier 4
Google DeepMind's "AI Co-Mathematician" — a stateful, asynchronous, multi-workstream agentic research workbench — scored 48% on FrontierMath Tier 4, the hardest mathematical reasoning benchmark, setting a new high-water mark. Active sessions produced solved open problems and recovered overlooked citations, demonstrating generalization to expert research workflows where sessions span days rather than minutes.
Why It Matters
At 48% on the hardest FrontierMath tier (versus 39–54% for turn-based competitors), AI research assistance has reached expert-mathematician territory on formal tasks. The multi-workstream, session-persistent design is the architecture pattern to watch for long-horizon agentic work.