Johns Hopkins Study: LLMs Fail 60% on Multi-Tier Agent Instructions

A Johns Hopkins April 2026 study across 850+ agentic tasks with up to 12 conflicting privilege levels found frontier LLMs fail approximately 60% of the time on multi-tier instruction hierarchies. Gemini 3.1 Pro achieved the highest accuracy at 42% on coding tasks; GPT 5.4 came in below 40%; Claude Opus 4.6 at 33%. Root cause: models perform semantic pattern-matching on privilege values rather than arithmetic comparison, flipping up to 17% of answers when numeric privilege values shift by ±1 while preserving order.

Why It Matters

Static trust hierarchies break above approximately six tiers in production multi-agent deployments. This study quantifies the structural failure mode — accuracy collapses monotonically as privilege levels increase — and directly challenges the assumption that current LLMs can reliably arbitrate conflicting instructions in agentic pipelines.