AssemblyAI Universal-3 Pro Streaming: LLM-as-Decoder ASR Under 300ms Latency
AssemblyAI has launched Universal-3 Pro Streaming, a speech recognition system built around a novel architecture: rather than using an LLM as a post-processing cleanup pass, it uses the LLM as the decoder itself, generating transcripts with grammar, context, and world knowledge in a single pass. The system achieves P50 latency under 300ms, supports mid-sentence bilingual switching across English, Spanish, French, German, Italian, and Portuguese, and allows domain-specific vocabulary to be injected before or during a conversation. Healthcare ambient documentation and industrial-noise applications are the primary target verticals. Entry is free at $50 credit, no card required.
Why It Matters
The LLM-as-decoder architecture dissolves the historical speed-versus-accuracy tradeoff in streaming ASR—this is the technical claim that matters, not just the latency number. If it holds under real production conditions, it unlocks voice agents in high-stakes verticals that streaming ASR previously couldn't serve reliably.