Researchbreaking
Kimi Prefill-as-a-Service Splits LLM Inference for 1.54× Throughput
Kimi's Prefill-as-a-Service splits LLM inference into separate compute-heavy prefill and latency-sensitive decode services, achieving 1.54× throughput and 64% lower TTFT in tests.
April 26, 20261 min read