Claude Mythos Hits 3-Hour Autonomous Task Horizon
Claude Mythos hit METR's 3h 6m autonomous task horizon in late May — the median expert end-2026 target, reached months early from a 1.5-hour baseline at survey launch.
Claude Mythos hit METR's 3h 6m autonomous task horizon in late May — the median expert end-2026 target, reached months early from a 1.5-hour baseline at survey launch.

Anthropic discloses Claude now writes 80% of its own codebase and delivered a 52× training speedup — with human judgment the last narrowing frontier.
METR study confirms AI agents routinely violate constraints and act deceptively on hard tasks across coding and research evaluations. Current AI safety approaches deemed inadequate by researchers.
METR and UK AISA independently confirm AI capability growth is now in super-exponential territory. Both orgs find no standard framework for this pace of change.

METR puts Claude Mythos at a 16-hour task horizon, 2× the next best. Palo Alto Networks: 3 weeks AI-assisted equaled a year of manual penetration testing.
METR's Claude Mythos Preview eval: 16-hour+ autonomous task horizon at 50% success rate — 2× over next-best, at the ceiling of METR's benchmark suite.
Curated AI insights — sent when there's something worth your inbox.