Anthropic's BioMysteryBench: Claude Solves 30% of Expert-Stumping Problems
Anthropic introduced BioMysteryBench, a new evaluation testing whether Claude can devise creative solutions to open-ended biological research problems. Of 99 problems tested against an expert panel, 23 stumped the researchers; Claude's most recent models solved roughly 30% of those hardest cases — and performed well on most of the rest.
Why It Matters
This signals Claude's capacity for genuine scientific reasoning, not just pattern retrieval — a meaningful bar for agentic research workflows in biotech and pharma. Full methodology is available in Anthropic's research post.