#text-to-speech
15 episodes
#2982: Why Your TTS Model Nails "Shabbat" but Not "Keren Hishtalmut
Why multilingual TTS models handle loanwords but fail at niche vocabulary — and what you can do about it.
#2914: Can AI Read the Room? TTS Prosody Explained
Can TTS models truly infer emotion from text, or just mimic patterns? We break down the science of prosody.
#2618: Text Normalization's Hidden Complexity
How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.
#2591: Decoupling Script from Voice
How dynamic voice replacement could let listeners choose who narrates each host's lines.
#2534: Can AI Generate Diagrams Without Typo Disasters?
Why AI diagram tools still mangle text labels — and what to do about it today.
#2311: Danish AI: Bridging the Localization Gap
How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.
#2303: The Serverless Paradox: Why TTS Eats Your Budget
How batch processing and smart queue management can slash TTS costs for episodic podcast production.
#2192: How We Built a Podcast Pipeline
Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...
#2027: The Missing Photoshop for Words
Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.
#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else
English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.
#1809: The TTS Developer's Dilemma: Size vs. Speed
Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.
#1808: The Architecture That Made AI Voices Run on a Raspberry Pi
How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.
#1740: Why Open Source Is a Power Tool Strategy
We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.
#1715: Why Voice Agents Need Frameworks (Not Just APIs)
Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.
#136: The Ghost in the Machine: Why AI Voices Hallucinate
Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.