VT Preprint: AI's Own Skills Beat Human-Defined Skill Files in SFT

A Virginia Tech preprint dated April 19, 2026 extracts "model-native skills" — latent axes of behavioral variation — directly from LLM residual-stream activations using sparse autoencoders, then uses those directions to select supervised fine-tuning data. On Llama-3 8B: model-native SFT scores 39.6 vs best human-skill SFT at 38.4. On Qwen 2.5 3B: 57.7 vs 56. The bigger finding is a 20% gain on MATH-1 and 41% on AMC by selecting SFT data via "representation error curriculum" — prioritizing activation directions where the model is currently weakest — rather than textual diversity.

Why It Matters

The Anthropic/Google skill-MD paradigm — which this publication and many of our peers build on — is a local maximum according to this paper. The successor approach works in activation space, not prompt space. Code is publicly available; this is a reproducible challenge to current best practice, not speculative theory.