Speech & Audio

Voice AI, speech recognition, and audio processing

26 episodes

Who’s Talking? The Tech of Speaker Identification

Herman and Corn break down the difference between speaker diarization and identification to help automate meeting transcripts.

speaker-diarizationvoice-embeddingsspeaker-identification

Sonic Sorcery: Mapping Spatial Audio in Small Spaces

Discover how spatial audio and room mapping can turn a tiny rental bedroom into a cinematic powerhouse without drilling a single hole.

spatial-audioacoustic-telemetryroom-mapping

The Sound Spotlight: How Beamforming Redefines Audio

Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.

beamforming-technologymicrophone-arraysdigital-signal-processing

Beyond the Robot: The Science of Modern Voice Cloning

Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.

neural-text-to-speechvoice-cloninggenerative-modeling

Designing the Voice-First Workspace: IKEA for AI Pros

Learn how to transform your home office into a high-performance voice-first workspace using acoustic hygiene and ergonomic IKEA furniture hacks.

voice-firstacoustic hygieneikeaworkspaceergonomics

Breaking the Voice Wall: The Future of Native Speech AI

Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.

large-language-modelslocal-aispeech-to-speech

The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive modelsaudio glitcheslatent space

Silencing the Siren: Real-Time AI Noise Reduction

How do phones remove sirens and crying babies in real time? Explore the neural networks and hardware making crystal-clear audio possible.

noise reductionaudio engineeringneural networksmobile devicesedge computing

Teaching AI to Hear: Solving the Custom Dictionary Dilemma

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

automatic speech recognitioncustom dictionariesgemini 1.5context bloatdynamic hint system

Beyond the Headset: Pro Audio for AI Voice Control

Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.

voice dictationai accuracymicrophonesaudio qualitysignal-to-noise ratio

Unsung Hero: The Gooseneck Mic's AI Power

The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!

gooseneck micspeech-to-textmicrophoneAI voice captureaudio technology

Clean Audio, Messy Reality: Noise Removal for Voice-to-Text

Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.

noise removalvoice-to-textaudio processingsignal processingneural networks

From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution

From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.

voice-technologyaccessibilityProductivity

The Unseen Magic of AI's Ears: Decoding VAD

Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.

voice activity detectionVADspeech recognitionASRspeech-to-text

The Multimodal Audio Revolution: A Screen-Free Future?

Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.

multimodal audiospeech-to-textscreen-freeaudio AIaccessibility

Personalizing Whisper: The Voice Typing Revolution

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

Mic Check: Mastering AI Dictation Hardware

Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.

large-language-modelsspeech-recognitionaudio-hardware

AI Gets Personal: The Power of Voice Fine-Tuning

AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.

fine-tuningspeech-recognitionpersonalized-ai

Building Custom ASR Tools

Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!

ASRspeech recognitioncustom asrmachine learningspeech to text

How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

Benchmarking Custom ASR Tools - Beyond The WER

Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.

ASRbenchmarkingwerspeech recognitionfine-tuning

Fine-Tuning ASR For Maximal Usability

Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.

ASRspeech recognitionfine-tuningdeploymentusability

How ASR Went From Frustration To ... Whisper Magic

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

automatic-speech-recognitionspeech-to-textasr-technology

Safetensors or something else: STT inference formats explained

Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.

safetensorsASRspeech recognitioninferenceweight formats

If Your Voice Ages, Does Your Fine-Tune Become Useless?

Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.

speech-recognitionfine-tuningvocal-physiology

Building Your Own Whisper

Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.

ASRspeech recognitionwhispermachine learningaudio processing