METR Eval: Claude Mythos Preview Hits 16-Hour Autonomous Task Horizon

METR's March 2026 evaluation of Claude Mythos Preview estimated a 50%-time-horizon of at least 16 hours — 2x+ over the next-best model — at the ceiling of what METR's current task suite can measure.

1 min read|agenticonsult Intelligence

METR Eval: Claude Mythos Preview Hits 16-Hour Autonomous Task Horizon

METR's March 2026 evaluation of an early Claude Mythos Preview snapshot estimated a 50%-time-horizon of at least 16 hours (95% CI: 8.5–55 hours) — more than twice the horizon of the next-best model. METR noted this sits at the upper bound of what its existing task suite can measure, meaning the true ceiling is unknown. Separate corroboration from Palo Alto Networks' pen-testing results reinforces the real-world capability claim.

Why It Matters

A 16-hour autonomous task horizon means Mythos can sustain complex multi-step work across a full working day without human checkpoints — a threshold that meaningfully redefines what "agentic" means in deployment. METR's external evaluation adds credibility that internal benchmarks cannot.

Primary source

METR

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.