Speech-to-Text
Whisper, ASR, transcription, voice typing
8 episodes
#2754: Why Your Dictation Setup Might Be Wrong
Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.
#2707: Foot Pedals vs USB Buttons: The Ergonomics of Dictation
Foot pedals, USB buttons, and under-desk macro pads for voice dictation — a deep dive into the hardware that makes AI dictation work.
#2512: How Speech-to-Speech Models Eliminate the Robot Voice
Why AI voice agents sound robotic, and how natively integrated speech-to-speech models fix it.
#2510: The Design That Makes Voice Agents Tolerable
Drive-thru accuracy, healthcare triage, and the design secret that makes people *want* to talk to a machine.
#2479: The Screaming Baby Stress Test
Choosing the right headset and control method for dictation when you're holding a baby who won't stop screaming.
#2311: Danish AI: Bridging the Localization Gap
How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.
#2272: The AI Transcription Sweet Spot
Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...
#1752: Whisper Small Beats Whisper Large in Speed & Accuracy
A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.