GPT-5.5 Reframes AI Progress as Intelligence Per Token

The dominant story of today's AI cycle is not a benchmark score — it's a ratio. On Terminal Bench, GPT-5.5 reaches 39.1 at 2,165 output tokens; GPT-5.4 reached 34.2 at 4,950. The model scores higher and uses 57% fewer tokens to do it. OpenAI's framing is deliberate: same per-token latency as 5.4, pricing doubled ($5/M input, $30/M output), but fewer tokens-per-task means comparable or lower effective cost per useful output.

What the Source Actually Says

Matthew Berman's two-week pre-access review calls the personality shift a deliberate product fix — 5.4's verbose explanations were a real UX problem; 5.5 responds in "exactly what you need" mode by default. The model's visual iteration loop inside Codex — where it inspects rendered output and self-corrects without user prompts — he rates strictly better than Claude Opus 4.7 for autonomous front-end work.

The other defining number is OS World: GPT-5.5 scores 78.7% on the computer-use benchmark, above the 72.4% human baseline. OpenAI led the announcement with Codex rather than the model for a reason: the April 16 Codex release had already added computer use, in-app browser, image generation, memory, and 90+ plugins. Background agents run without hijacking your cursor — practical parallel multi-agent workflows, not a demo. Nate B Jones frames the strategic split clearly: Codex drives any GUI a human would drive, bypassing ecosystem dependencies entirely.

Enterprise validation is concrete. Box AI's complex-work eval jumped from 67% to 77% across financial services (+20 pts), healthcare (61→78), and public sector (59→72). OpenAI and NVIDIA piloted a whole-company Codex enterprise rollout, publicly inviting other companies to replicate the model.

Strategic Take

Intelligence-per-token matters because it decouples the value story from sticker pricing. If a model produces equivalent output at half the token count, the per-task cost calculation flips even when the per-million-token price rises. Businesses evaluating GPT-5.5 should be modelling cost-per-outcome, not cost-per-million-tokens.