TALOS-V2 Runs a Full Transformer in FPGA at 53,000 tok/sec Without a GPU

TALOS-V2 compiles Karpathy's 4,192-parameter microGPT entirely into a Cyclone V FPGA, achieving 53,000 tokens per second on a battery-powered, credit-card-size board with no GPU, no Python interpreter, and no runtime software layer. All transformer components — embeddings, attention, normalization, MLP, and token sampling — are implemented as FPGA logic. The repository ships with JTAG build tooling for replication.

Why It Matters

Demonstrates that the software runtime layer is not load-bearing for small-model inference, achieving extreme throughput-per-watt ratios that challenge the default assumption that GPU-software stacks are the only viable inference path at the edge.