Speech & Audio
Voice AI, speech recognition, and audio processing
7 episodes
#2183: Making Voice Agents Feel Natural
Turn-taking, interruptions, and latency are destroying voice AI UX—and the fixes are deeply technical. Here's what's actually happening underneath.
#1809: The TTS Developer's Dilemma: Size vs. Speed
Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.
#1808: The 82M Parameter Voice That Beat Billion-Dollar AI
How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.
#1800: The Engineering of Urgent Sound
Why some sounds make your skin crawl: the science of emergency alerts.
#1778: Audio Is the New "Read Later" Graveyard
Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.
#1752: Whisper Small Beats Whisper Large in Speed & Accuracy
A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.
#1724: YouTube's Invisible AI Dubbing Machine
How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.