METR Eval: Claude Mythos Preview Hits 16-Hour Autonomous Task Horizon

METR's March 2026 evaluation of an early Claude Mythos Preview snapshot estimated a 50%-time-horizon of at least 16 hours (95% CI: 8.5–55 hours) — more than twice the horizon of the next-best model. METR noted this sits at the upper bound of what its existing task suite can measure, meaning the true ceiling is unknown. Separate corroboration from Palo Alto Networks' pen-testing results reinforces the real-world capability claim.

Why It Matters

A 16-hour autonomous task horizon means Mythos can sustain complex multi-step work across a full working day without human checkpoints — a threshold that meaningfully redefines what "agentic" means in deployment. METR's external evaluation adds credibility that internal benchmarks cannot.