NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress

IntologyAI tests Codex, Claude Code, and Autoresearch on the NanoGPT Speedrun benchmark covering five months of world records and approximately two years of human submissions — and finds coding agents recover only 9.3% of human AI R&D progress, primarily tuning hyperparameters while ignoring the algorithmic research that drives most human gains.

1 min read|agenticonsult Intelligence

NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress

IntologyAI's NanoGPT-Bench tests Codex, Claude Code, and Autoresearch on the NanoGPT Speedrun — a five-month window of world-record submissions spanning approximately two years of human contributions. Agents recover 9.3% of human AI research progress overall, with behavior concentrated on hyperparameter tuning. The algorithmic research — the source of most human gains — is largely ignored by agents. Evaluation ran fully autonomously with no internet access and no human intervention.

Why It Matters

This establishes a concrete capability gap: current frontier coding agents can replicate and optimize existing approaches but are not yet generating the algorithmic innovations that drive research progress. The gap is not about benchmark gaming — the full NanoGPT Speedrun is a real competitive research trajectory.

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress

NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress

Why It Matters

Live Intel Feed