Speech & Audio
Voice AI, speech recognition, and audio processing
26 episodes
Who’s Talking? The Tech of Speaker Identification
Herman and Corn break down the difference between speaker diarization and identification to help automate meeting transcripts.
Sonic Sorcery: Mapping Spatial Audio in Small Spaces
Discover how spatial audio and room mapping can turn a tiny rental bedroom into a cinematic powerhouse without drilling a single hole.
The Sound Spotlight: How Beamforming Redefines Audio
Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.
Beyond the Robot: The Science of Modern Voice Cloning
Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.
Designing the Voice-First Workspace: IKEA for AI Pros
Learn how to transform your home office into a high-performance voice-first workspace using acoustic hygiene and ergonomic IKEA furniture hacks.
Breaking the Voice Wall: The Future of Native Speech AI
Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.
The Ghost in the Machine: Why AI Voices Hallucinate
Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.
Silencing the Siren: Real-Time AI Noise Reduction
How do phones remove sirens and crying babies in real time? Explore the neural networks and hardware making crystal-clear audio possible.
Teaching AI to Hear: Solving the Custom Dictionary Dilemma
Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.
Beyond the Headset: Pro Audio for AI Voice Control
Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.
Unsung Hero: The Gooseneck Mic's AI Power
The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!
Clean Audio, Messy Reality: Noise Removal for Voice-to-Text
Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.
From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution
From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.
The Unseen Magic of AI's Ears: Decoding VAD
Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.
The Multimodal Audio Revolution: A Screen-Free Future?
Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.
Personalizing Whisper: The Voice Typing Revolution
Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!
Mic Check: Mastering AI Dictation Hardware
Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.
AI Gets Personal: The Power of Voice Fine-Tuning
AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.
Building Custom ASR Tools
Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!
How To Fine Tune Whisper
Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.
Benchmarking Custom ASR Tools - Beyond The WER
Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.
Fine-Tuning ASR For Maximal Usability
Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.
How ASR Went From Frustration To ... Whisper Magic
Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!
Safetensors or something else: STT inference formats explained
Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.
If Your Voice Ages, Does Your Fine-Tune Become Useless?
Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.
Building Your Own Whisper
Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.