Token Theater: How Enterprises Are Measuring AI Adoption Backwards

A thread from Matt Shumer (@mattshumer_) is drawing significant traction for documenting an emerging and damaging enterprise AI failure pattern: organizations are evaluating AI productivity by tokens consumed and tools connected — not by business output — and creating perverse incentives that will make AI look like it failed when it actually worked.

What the Source Actually Says

Shumer reports firsthand accounts from inside large companies where promotions, performance reviews, and firing decisions are being made based on a single metric: tokens consumed plus skills/MCPs connected. The practical consequence is predictable. Employees have started running loops to artificially inflate token counts, accumulating usage while producing nothing. Meanwhile, the employee actually shipping results with efficient usage — "2 skills and 50M tokens" — registers as a laggard against someone burning a billion tokens on empty operations.

The thread's prediction is specific: within 18 months, the same executives setting these metrics will announce that "AI didn't deliver ROI" and cut the AI budget. Shumer's diagnosis is equally specific: "AI will have worked fine. They just measured the wrong fucking thing and torched millions rewarding theater over output."

The irony Shumer highlights is pointed: measuring actual output is now easier than it has ever been, precisely because you have AI available to analyze work logs, pull delivery metrics, and surface signal from noise. The tools to measure correctly exist. The organizational will to use them often doesn't.

This is not a niche concern. The pattern maps onto a well-documented failure mode from the early cloud era, when IT organizations were evaluated on infrastructure provisioned rather than applications delivered. The metric shaped behavior; the behavior sabotaged the goal.

Strategic Take

For organizations deploying AI: establish outcome metrics before spend metrics. Instrument for shipped work, resolved queues, and decision speed — not for model invocations. Teams that build the measurement infrastructure correctly now will have the evidence base to protect their AI budgets when the ROI scrutiny cycle arrives.