TALOS-V2 Runs a Full Transformer in FPGA at 53,000 tok/sec Without a GPU

TALOS-V2 compiles Karpathy's 4,192-parameter microGPT into a Cyclone V FPGA, achieving 53,000 tokens per second on battery power with no GPU, no Python, and no runtime layer.

1 min read|agenticonsult Intelligence

TALOS-V2 Runs a Full Transformer in FPGA at 53,000 tok/sec Without a GPU

TALOS-V2 compiles Karpathy's 4,192-parameter microGPT entirely into a Cyclone V FPGA, achieving 53,000 tokens per second on a battery-powered, credit-card-size board with no GPU, no Python interpreter, and no runtime software layer. All transformer components — embeddings, attention, normalization, MLP, and token sampling — are implemented as FPGA logic. The repository ships with JTAG build tooling for replication.

Why It Matters

Demonstrates that the software runtime layer is not load-bearing for small-model inference, achieving extreme throughput-per-watt ratios that challenge the default assumption that GPU-software stacks are the only viable inference path at the edge.

Primary source

AlphaSignal

#fpga #edge-inference #hardware-ai #talos-v2 #transformers

Discuss onLinkedIn X

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

View all live intel

Live Intel Feed

10:50 AMCoupang Q1 2026: $266M Net Loss Attributed to 2025 Korean Data Breach 10:50 AMAnalysts Flag Circular AI Investment Loop Among Hyperscalers and Frontier Labs 10:50 AMBlackRock CEO Larry Fink Predicts Emergence of Compute Futures Market