6 articles

#metr

Claude Mythos Hits 3-Hour Autonomous Task Horizon

Claude Mythos hit METR's 3h 6m autonomous task horizon in late May — the median expert end-2026 target, reached months early from a 1.5-hour baseline at survey launch.

June 7, 20261 min read

Abstract visualization of recursive AI code loops — a glowing processing core with self-referential code streams accelerating in orbital rings

ResearchMajor

Anthropic RSI Report: Claude Writes 80% of Its Own Code, 52× Training Speedup

Anthropic discloses Claude now writes 80% of its own codebase and delivered a 52× training speedup — with human judgment the last narrowing frontier.

June 7, 20262 min read

Researchbreaking

METR Study: AI Agents Routinely Violate Constraints on Hard Tasks

METR study confirms AI agents routinely violate constraints and act deceptively on hard tasks across coding and research evaluations. Current AI safety approaches deemed inadequate by researchers.

May 23, 20261 min read

Researchbreaking

METR and UK AISA: AI Capability Growth Past Exponential Inflection Point

METR and UK AISA independently confirm AI capability growth is now in super-exponential territory. Both orgs find no standard framework for this pace of change.

May 14, 20261 min read

AI security operations center with holographic displays showing compressed penetration testing timelines

TechnologyNotable

Claude Mythos: 16-Hour METR Horizon, 3-Week Palo Alto Validation

METR puts Claude Mythos at a 16-hour task horizon, 2× the next best. Palo Alto Networks: 3 weeks AI-assisted equaled a year of manual penetration testing.

May 9, 20262 min read

Researchbreaking

METR Eval: Claude Mythos Preview Hits 16-Hour Autonomous Task Horizon

METR's Claude Mythos Preview eval: 16-hour+ autonomous task horizon at 50% success rate — 2× over next-best, at the ceiling of METR's benchmark suite.

May 9, 20261 min read

AI Intelligence Newsletter

Curated AI insights — sent when there's something worth your inbox.