smol-audio Launches: Local Audio Model Fine-Tuning Notebook Collection

HuggingFace has released smol-audio, a collection of notebooks and scripts covering the full local audio model fine-tuning stack. The collection includes fine-tuning pipelines for Whisper, Parakeet, Voxtral, and Granite Speech; full and LoRA fine-tuning for Audio Flamingo 3; Dialogue TTS with Dia-1.6B; and zero-shot video and audio-to-text retrieval via Meta's PE-AV. The resource is positioned as a practitioner-grade audio cookbook requiring no cloud inference — all workflows run locally.

Why It Matters

smol-audio extends the HuggingFace "smol" resource ecosystem into audio, making production-quality audio model fine-tuning accessible to the same practitioner audience that has used smol-llm and smol-vision. It lands at a moment when local AI is reaching qualitative parity with cloud inference in multiple modalities simultaneously.