FinAI Study: Same Model, Different Harness — Up to 3× Accuracy Swing

An NVIDIA-backed benchmark across 4 frontier models and 5 agent frameworks found that framework choice alone produces up to 3× accuracy swings on identical model backbones — Claude Sonnet 4.6 scores 66% under Claude Code but only 20% under ReAct on the same auditing task.

1 min read|agenticonsult Intelligence

FinAI Study: Same Model, Different Harness — Up to 3× Accuracy Swing

A 13-institution, NVIDIA-funded benchmark published May 13 ran four frontier LLMs across five agent frameworks on financial tasks including trading, hedging, market insights, and auditing. The headline finding: Claude Sonnet 4.6 hit 66.15% auditing accuracy under Claude Code or OpenClaw but collapsed to 20% under ReAct on the identical model backbone — a 3× swing from framework choice alone. No configuration maintained performance when the live evaluation regime shifted from bearish to bullish market conditions.

Why It Matters

Framework selection is now a quantifiable business risk for agentic deployments. The 3× accuracy swing from harness choice makes model selection a secondary variable — enterprises that optimize only on model benchmarks are optimizing the wrong dimension.

Primary source

Discover AI / NVIDIA Academic Grant / 13-Institution Consortium

Discuss onLinkedIn X

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

View all live intel

Live Intel Feed

11:11 AMMIT: AI Agents in Supply Chains Create Bullwhip Effect Despite Outperforming Humans 11:10 AMagentmemory Crosses 11.6k Stars: Persistent Memory Daemon for Coding Agents 11:10 AMRodin Gen-2.5: First 10M-Polygon 3D Generative AI, 1M Polygons in 4 Seconds