#speech-recognition

42 episodes

Jun 23

#3854: From Coos to Conversation: Baby's Hidden On-Ramp

How do babies go from babbling to real back-and-forth dialogue? The hidden architecture of early conversation.

child-developmentspeech-recognitionneurodivergence

Jun 12

#3493: Murmuring Scriptures and Wandering Wilds: Ancient Meditation

How "hagah" (murmuring scripture) and "hitbodedut" (wilderness solitude) reveal meditation hidden in the Bible.

neurodivergencechild-developmentspeech-recognition

Jun 10

#3443: What Makes a Pediatrician's Diagnostic Skill Unique

How pediatricians diagnose without patient history, reading cries, body language, and parent-child dynamics.

child-developmentsensory-processingspeech-recognition

Jun 8

#3363: Why the Teletubbies Sun-Baby Makes Infants Cry

The Teletubbies was engineered for pre-verbal brains. Here's why adult discomfort is a feature, not a bug.

child-developmentsensory-processingspeech-recognition

May 13

#2801: Why Baby Babble Sounds Like Foreign Languages

Your baby isn't speaking Korean — but here's why the overlap isn't a coincidence.

child-developmentlinguisticsspeech-recognition

May 11

#2754: Why Your Dictation Setup Might Be Wrong

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

automatic-speech-recognitionspeech-recognitionaudio-processing

May 5

#2643: How Stenographers Type 300 Words Per Minute

Court reporters don’t type letters—they chord syllables at 300 words per minute. Here’s how it works and why AI can’t replace them yet.

speech-recognitionaudio-processingaccessibility

May 3

#2618: Text Normalization's Hidden Complexity

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

May 2

#2590: The Uncanny Valley of Clean Speech

How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.

speech-recognitionaudio-processingautomatic-speech-recognition

May 1

#2582: What Your Browser Does to Mic Audio Before It Reaches Your Server

getUserMedia returns audio, but not raw audio. Here's what browsers actually do to your mic feed before it hits your server.

audio-processingspeech-recognitionbrowser-audio-pipeline

May 1

#2563: How Audio Fingerprinting Actually Works

Spectrogram peaks, constellation maps, and hash matching — the elegant mechanics behind identifying any song in seconds.

audio-processingsignal-processingspeech-recognition

Apr 30

#2543: Why Base64 Adds 33% Overhead (And Why You Still Need It)

Base64 isn’t compression — it’s a safe transport encoding. Here’s how it works with audio APIs and where its limits are.

audio-engineeringspeech-recognitionapi-integration

Apr 29

#2510: The Design That Makes Voice Agents Tolerable

Drive-thru accuracy, healthcare triage, and the design secret that makes people *want* to talk to a machine.

voice-firstaccessibilityspeech-recognition

Apr 27

#2486: Why Noise Reduction Can Ruin Transcription Accuracy

Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.

speech-recognitionaudio-processingautomatic-speech-recognition

Apr 27

#2479: The Screaming Baby Stress Test

Choosing the right headset and control method for dictation when you're holding a baby who won't stop screaming.

speech-recognitionvoice-firstdiy

Apr 26

#2443: How Podcast RSS Feeds Can Speak Every Language

One RSS feed, a transcript tag, and TTS voice cloning — the emerging standard for letting any podcast speak any language.

speech-recognitionvoice-cloningaudio-processing

Apr 19

#2337: When Diarization Fails Silently

Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.

audio-processingspeech-recognitionautomatic-speech-recognition

Apr 19

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

Apr 17

#2288: The Invisible Gatekeeper of Voice Tech

How voice activity detection shapes every step of the voice tech pipeline, and why it’s harder than it seems.

speech-recognitionaudio-processingedge-computing

Apr 17

#2272: The AI Transcription Sweet Spot

Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...

speech-recognitionaudio-processingai-training

Apr 12

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech