Research
MILKYWAY Shows Agent Scaffolding Can Outperform Fine-Tuning
A new paper freezes GPT-5.4's weights and puts all learning in an editable text harness, hitting 61% on prediction benchmarks where the base model scores 44%.
April 23, 20262 min read