Speech & Audio

Voice AI, speech recognition, and audio processing

26 episodes

Jan 28, 2026

Who’s Talking? The Tech of Speaker Identification

Herman and Corn break down the difference between speaker diarization and identification to help automate meeting transcripts.

speaker-diarizationvoice-embeddingsspeaker-identification

27:07 Audio Processing

Jan 26, 2026

Sonic Sorcery: Mapping Spatial Audio in Small Spaces

Discover how spatial audio and room mapping can turn a tiny rental bedroom into a cinematic powerhouse without drilling a single hole.

spatial-audioacoustic-telemetryroom-mapping

23:06 Audio Processing

Jan 15, 2026

The Sound Spotlight: How Beamforming Redefines Audio

Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.

beamforming-technologymicrophone-arraysdigital-signal-processing

22:48 Audio Processing

Jan 8, 2026

Beyond the Robot: The Science of Modern Voice Cloning

Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.

neural-text-to-speechvoice-cloninggenerative-modeling

23:28 Text-to-Speech

Jan 4, 2026

Designing the Voice-First Workspace: IKEA for AI Pros

Learn how to transform your home office into a high-performance voice-first workspace using acoustic hygiene and ergonomic IKEA furniture hacks.

voice-firstacoustic hygieneikeaworkspaceergonomics

22:13 Audio Processing

Jan 3, 2026

Breaking the Voice Wall: The Future of Native Speech AI

Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.

large-language-modelslocal-aispeech-to-speech

29:05 Speech-to-Text

Jan 2, 2026

The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive modelsaudio glitcheslatent space

24:00 Text-to-Speech

Dec 29, 2025

Silencing the Siren: Real-Time AI Noise Reduction

How do phones remove sirens and crying babies in real time? Explore the neural networks and hardware making crystal-clear audio possible.

noise reductionaudio engineeringneural networksmobile devicesedge computing

22:04 Audio Processing

Dec 27, 2025

Teaching AI to Hear: Solving the Custom Dictionary Dilemma

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

automatic speech recognitioncustom dictionariesgemini 1.5context bloatdynamic hint system

23:29 Speech-to-Text

Dec 24, 2025

Beyond the Headset: Pro Audio for AI Voice Control

Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.

voice dictationai accuracymicrophonesaudio qualitysignal-to-noise ratio

24:16 Audio Processing

Dec 22, 2025

Unsung Hero: The Gooseneck Mic's AI Power

The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!

gooseneck micspeech-to-textmicrophoneAI voice captureaudio technology

21:38 Speech-to-Text

Dec 12, 2025

Clean Audio, Messy Reality: Noise Removal for Voice-to-Text

Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.

noise removalvoice-to-textaudio processingsignal processingneural networks

28:35 Audio Processing

Dec 11, 2025

From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution

From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.

voice-technologyaccessibilityProductivity

29:50 Speech-to-Text

Dec 8, 2025

The Unseen Magic of AI's Ears: Decoding VAD

Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.

voice activity detectionVADspeech recognitionASRspeech-to-text

19:34 Audio Processing

Dec 7, 2025

The Multimodal Audio Revolution: A Screen-Free Future?

Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.

multimodal audiospeech-to-textscreen-freeaudio AIaccessibility

25:47 Speech-to-Text

Dec 5, 2025

Personalizing Whisper: The Voice Typing Revolution

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

23:27 Speech-to-Text

Dec 5, 2025

Mic Check: Mastering AI Dictation Hardware

Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.

large-language-modelsspeech-recognitionaudio-hardware

25:50 Speech-to-Text

Nov 28, 2025

AI Gets Personal: The Power of Voice Fine-Tuning

AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.

fine-tuningspeech-recognitionpersonalized-ai

17:40 Voice Cloning

Nov 24, 2025

Building Custom ASR Tools

Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!

ASRspeech recognitioncustom asrmachine learningspeech to text

37:42 Speech-to-Text

Nov 24, 2025

How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

31:42 Speech-to-Text

Nov 24, 2025

Benchmarking Custom ASR Tools - Beyond The WER

Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.

ASRbenchmarkingwerspeech recognitionfine-tuning

36:00 Speech-to-Text

Nov 24, 2025

Fine-Tuning ASR For Maximal Usability

Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.

ASRspeech recognitionfine-tuningdeploymentusability

32:15 Speech-to-Text

Nov 24, 2025

How ASR Went From Frustration To ... Whisper Magic

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

automatic-speech-recognitionspeech-to-textasr-technology

34:09 Speech-to-Text

Nov 24, 2025

Safetensors or something else: STT inference formats explained

Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.

safetensorsASRspeech recognitioninferenceweight formats

32:56 Speech-to-Text

Nov 24, 2025

If Your Voice Ages, Does Your Fine-Tune Become Useless?

Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.

speech-recognitionfine-tuningvocal-physiology

38:26 Voice Cloning

Nov 24, 2025

Building Your Own Whisper

Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.

ASRspeech recognitionwhispermachine learningaudio processing

34:41 Speech-to-Text