#multimodal-ai
19 episodes
#2688: Intelligent Frame Extraction for Multimodal AI
Use multimodal AI and smart frame extraction to turn a walk-through video into an actionable decluttering plan.
#2409: When AI Cheats on Cultural Knowledge
Five benchmarks that reveal how AI systems fail at cultural knowledge — and what their methodologies tell us.
#1964: The Three Layers That Make AR Finally Work
See a 3D arrow pointing to the exact bolt you need, or read a street sign in real-time translation.
#1792: Google's Native Multimodal Embedding Kills the Fusion Layer
Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.
#1724: When AI Dubbing Swaps Your Gender
How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.
#1592: The Vector Debt Trap: Choosing Embeddings That Last
Stop treating embedding models like plumbing. Learn how to navigate vector debt, multimodal retrieval, and database configuration for RAG.
#1586: The Rocketbook Sunset and the Search for a Clean Erase
Bridge the gap between handwritten notes and AI. Discover the best whiteboard notebooks and markers for seamless digital transcription.
#1568: The Signal Versus Symbol Gap
Is Gemini a brilliant audio engineer or just a talented lip-reader? Explore the "signal vs. symbol" gap in AI audio processing.
#1564: The Death of the Cascaded Pipeline
Forget basic transcription. Explore how native omni-modal models are capturing the "soul" of speech with near-instant latency.
#1482: The Hidden Cost of Choosing an Embedding Model
From Matryoshka models to multimodal search, discover how the fundamental units of AI memory are being optimized for efficiency and scale.
#1085: The Tokenization Lie: How AI Actually Processes Media
Think 1,000 tokens equals 750 words? For audio and video, that rule is a lie. Discover the hidden math behind multimodal AI.
#786: The Cost of a Touch: When Your Hoard Becomes a Liability
Learn how to manage thousands of parts without losing your mind using AI, QR codes, and professional logistics strategies.
#769: When Manuals Learn to See in 3D
Discover how AI and spatial computing are turning complex hardware repairs into real-time, interactive experiences.
#749: The Live vs. Scripted Trade-Off in AI Podcasting
Can AI podcasts move from polished scripts to raw, real-time conversation? Explore the technical and financial shift to live multimodal models.
#132: How AI Learns to See Time as a Dimension
Discover how spatial-temporal tokenization and 3D world modeling are revolutionizing real-time video-to-video AI interaction.
#64: How AI Learns to See, Hear, and Think Together
AI is evolving beyond text, learning to see, hear, and understand our world. Discover the future of human-AI interaction!
#54: How AI Unifies Images, Audio, and Text
Omnimodal AI: How do models process images, audio, video, and text all at once? Discover the engineering behind AI that accepts anything.
#53: Instructional vs. Conversational AI: The Distinction Nobody Talks About
Instructional vs. conversational AI: a crucial distinction reshaping how AI is built. Discover why it matters for the future of AI development.
#46: Pixels, Prompts & Pseudo-Text: AI's Word Problem
AI paints stunning images, but can't spell "cat." Why do advanced models struggle with simple text? Dive into AI's weird word problem!