AssemblyAI Universal-3 Pro Streaming: LLM-as-Decoder ASR Under 300ms Latency

AssemblyAI launched Universal-3 Pro Streaming, a speech recognition system that uses an LLM as the decoder itself rather than as a post-processing pass, achieving P50 latency under 300ms. It supports mid-sentence bilingual switching across six languages and allows domain vocabulary injection before or during a conversation.

1 min read|agenticonsult Intelligence

AssemblyAI Universal-3 Pro Streaming: LLM-as-Decoder ASR Under 300ms Latency

AssemblyAI has launched Universal-3 Pro Streaming, a speech recognition system built around a novel architecture: rather than using an LLM as a post-processing cleanup pass, it uses the LLM as the decoder itself, generating transcripts with grammar, context, and world knowledge in a single pass. The system achieves P50 latency under 300ms, supports mid-sentence bilingual switching across English, Spanish, French, German, Italian, and Portuguese, and allows domain-specific vocabulary to be injected before or during a conversation. Healthcare ambient documentation and industrial-noise applications are the primary target verticals. Entry is free at $50 credit, no card required.

Why It Matters

The LLM-as-decoder architecture dissolves the historical speed-versus-accuracy tradeoff in streaming ASR—this is the technical claim that matters, not just the latency number. If it holds under real production conditions, it unlocks voice agents in high-stakes verticals that streaming ASR previously couldn't serve reliably.

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

AssemblyAI Universal-3 Pro Streaming: LLM-as-Decoder ASR Under 300ms Latency

AssemblyAI Universal-3 Pro Streaming: LLM-as-Decoder ASR Under 300ms Latency

Why It Matters

Live Intel Feed