Cloud AI's Cheap-Forever Pricing Is Breaking Down
GitHub Copilot removed Opus from its Pro tier last week. Anthropic throttled Claude Pro Code during US peak hours and its head-of-growth called it "an experiment." Two independent sources published on the same day argue this is not a service hiccup — it is the first visible phase of a structural pricing correction underway across the cloud AI industry.
What the Source Actually Says
Tim Carambat (Anything LLM, Apr 28) documents the enshitification pattern with specificity: GitHub Copilot Pro has stopped accepting new signups, dropped usage limits, and stripped Opus while retaining only Claude 4.7 — "the worst one." Claude Pro Max is predicted to hit $500/month within months as the current $20 plan no longer sustains compute margins. More consequential for enterprises, the per-seat billing model is giving way to per-token pricing. In conversations with firms of 30–50K employees, Carambat maps the planning risk directly: per-seat is projectable; token cost is unbounded. Three cents per email summary, multiplied across 50,000 employees over a year, creates a liability without a ceiling. "Switching to a cheaper provider," he notes, "just puts you back at square one with a different landlord."
The financials behind the degradation are equally stark. Mandar Karhade's analysis (GoPubby, Apr 27) opens the accounting: OpenAI burned approximately $9B against $13B in 2025 revenue and projects $14B in losses for 2026 — with inference costs hitting $12B with Microsoft alone by November 2025. Anthropic's gross margins have compressed to roughly 40%, down from a 50% target. The subsidy enabling current pricing comes from Microsoft, Amazon, Nvidia, and SoftBank simultaneously funding, supplying, and buying from the same labs. Karhade's conclusion: the industry is one liquidity event away from disorderly 10× repricing.
Strategic Take
Teams planning AI budgets for 2026–27 should stress-test current token rates against a 5–10× repricing scenario — the subsidy that makes today's pricing possible is structural, not permanent. Local inference (Qwen 3.5 0.8B, 128K context, consumer hardware) has matured enough to serve as a compute hedge for productivity and coding workloads.


