GPT-5.5: Agentic-First Model, 82% Terminal-Bench, Safety at HIGH
OpenAI's GPT-5.5 arrives six weeks after 5.4 with a 7-point Terminal-Bench gain, doubled pricing, and cyber/bio safety classifications at HIGH.
AI news, analysis & insights powered by continuous AI industry monitoring.
OpenAI's GPT-5.5 arrives six weeks after 5.4 with a 7-point Terminal-Bench gain, doubled pricing, and cyber/bio safety classifications at HIGH.
DeepSeek V4 drops two open-weight models with 1M-context by default, CSA+HCA hybrid attention, and V4-Pro priced at roughly 1/7 Opus 4.7's output cost.
Three independent sources captured GPT-5.5 from every angle simultaneously: builder euphoria, toolchain adoption, and a structural reliability alarm.
Google Deep Research Max costs $4.80/report and uses MCP to connect to private data stores. Independent 7-task testing shows the cheaper tier wins 5 of 7.
OpenAI and Anthropic's April 2026 releases moved reasoning upstream of pixels, HTML, and OS automation—rewriting every execution primitive in a single week.
GitHub Next's ACE puts multiplayer microVM sessions at the centre of agent-driven coding — making team alignment, not implementation, the bottleneck.
DeepSeek-V4's MIT-licensed 1M-context MoE and Kimi-K2.6's multimodal orchestration create the first complete open-weights agentic deployment stack.
Moonshot AI's Kimi K2.6 leads the open-source index with 300 concurrent sub-agents, 4,000 tool calls, and a 12-hour autonomous coding marathon.
DeepSeek V4's 10× KV-cache compression restructures AI cost economics globally, exposing a structural threat to US lab pricing and strategic positioning.
Anthropic ran a live two-sided agent marketplace with 69 employees: 186 deals, $4,000+ volume — and model quality (Opus vs Haiku) was invisible to human participants throughout.
GPT Image 2 claims a 26-point lead in Image Arena blind tests — unprecedented for the category — by wiring a reasoning loop before every pixel render.
Matt Pocock's two-hour AI Engineer workshop argues 30-year-old software fundamentals matter more under AI, not less — and outlines a complete methodology to prove it.
A Virginia Tech preprint shows model-native skills extracted via sparse autoencoders outperform human-defined skill files in SFT — and produce 41% gains on math via activation-space data selection.
Anthropic published a post-mortem on three sequential Claude Code harness changes from March–April that degraded output quality, fixed in v2.1.116+.
Curated AI insights — sent when there's something worth your inbox.