← All Tags

#speech-recognition

42 episodes

#3854: From Coos to Conversation: Baby's Hidden On-Ramp

How do babies go from babbling to real back-and-forth dialogue? The hidden architecture of early conversation.

child-developmentspeech-recognitionneurodivergence

#3493: Murmuring Scriptures and Wandering Wilds: Ancient Meditation

How "hagah" (murmuring scripture) and "hitbodedut" (wilderness solitude) reveal meditation hidden in the Bible.

neurodivergencechild-developmentspeech-recognition

#3443: What Makes a Pediatrician's Diagnostic Skill Unique

How pediatricians diagnose without patient history, reading cries, body language, and parent-child dynamics.

child-developmentsensory-processingspeech-recognition

#3363: Why the Teletubbies Sun-Baby Makes Infants Cry

The Teletubbies was engineered for pre-verbal brains. Here's why adult discomfort is a feature, not a bug.

child-developmentsensory-processingspeech-recognition

#2801: Why Baby Babble Sounds Like Foreign Languages

Your baby isn't speaking Korean — but here's why the overlap isn't a coincidence.

child-developmentlinguisticsspeech-recognition

#2754: Why Your Dictation Setup Might Be Wrong

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

automatic-speech-recognitionspeech-recognitionaudio-processing

#2643: How Stenographers Type 300 Words Per Minute

Court reporters don’t type letters—they chord syllables at 300 words per minute. Here’s how it works and why AI can’t replace them yet.

speech-recognitionaudio-processingaccessibility

#2618: Text Normalization's Hidden Complexity

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

#2590: The Uncanny Valley of Clean Speech

How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.

speech-recognitionaudio-processingautomatic-speech-recognition

#2582: What Your Browser Does to Mic Audio Before It Reaches Your Server

getUserMedia returns audio, but not raw audio. Here's what browsers actually do to your mic feed before it hits your server.

audio-processingspeech-recognitionbrowser-audio-pipeline

#2563: How Audio Fingerprinting Actually Works

Spectrogram peaks, constellation maps, and hash matching — the elegant mechanics behind identifying any song in seconds.

audio-processingsignal-processingspeech-recognition

#2543: Why Base64 Adds 33% Overhead (And Why You Still Need It)

Base64 isn’t compression — it’s a safe transport encoding. Here’s how it works with audio APIs and where its limits are.

audio-engineeringspeech-recognitionapi-integration

#2510: The Design That Makes Voice Agents Tolerable

Drive-thru accuracy, healthcare triage, and the design secret that makes people *want* to talk to a machine.

voice-firstaccessibilityspeech-recognition

#2486: Why Noise Reduction Can Ruin Transcription Accuracy

Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.

speech-recognitionaudio-processingautomatic-speech-recognition

#2479: The Screaming Baby Stress Test

Choosing the right headset and control method for dictation when you're holding a baby who won't stop screaming.

speech-recognitionvoice-firstdiy

#2443: How Podcast RSS Feeds Can Speak Every Language

One RSS feed, a transcript tag, and TTS voice cloning — the emerging standard for letting any podcast speak any language.

speech-recognitionvoice-cloningaudio-processing

#2337: When Diarization Fails Silently

Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.

audio-processingspeech-recognitionautomatic-speech-recognition

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

#2288: The Invisible Gatekeeper of Voice Tech

How voice activity detection shapes every step of the voice tech pipeline, and why it’s harder than it seems.

speech-recognitionaudio-processingedge-computing

#2272: The AI Transcription Sweet Spot

Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...

speech-recognitionaudio-processingai-training

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech

#2183: Making Voice Agents Feel Natural

Turn-taking, interruptions, and latency are destroying voice AI UX—and the fixes are deeply technical. Here's what's actually happening underneath.

speech-recognitionconversational-ailatency

#2027: The Missing Photoshop for Words

Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.

local-aitext-to-speechspeech-recognition

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency