Harness Engineering Beats Model Upgrades: AHE Framework and 20% Terminal-Bench Gains
Two independent data points confirm harness engineering as the primary performance lever. The Agentic Harness Engineering (AHE) framework — NLP Newsletter's top paper of the week — lifts Pass@1 from 69.7% to 77.0% on Terminal-Bench 2, beating Codex-CLI by 5.1 points with 12% fewer tokens. Separately, LangChain reports 13–20% Terminal-Bench gains from prompt and middleware changes alone, without any model upgrade.
Why It Matters
Harness quality — not model capability — is now the dominant competitive variable for production agent systems. Following frontier lab prompting guides and investing in middleware optimization yields significant, measurable returns without waiting for the next model release.