#rag
70 episodes
#2315: How to Update AI Models Without Starting Over
Exploring the challenge of updating AI models with new knowledge without costly full retraining.
#2228: Tuning RAG: When Retrieval Helps vs. Hurts
How do you prevent retrieval from suppressing a model's reasoning? We diagnose our own pipeline's four control levers and multi-source fusion strat...
#2214: Real-Time News at War Speed: Building AI Pipelines for Breaking Conflict
When a conflict changes hourly, AI systems built for yesterday's information fail. Here's how to architect pipelines that actually keep up.
#2213: Grading the News: Benchmarking RAG Search Tools
How do you rigorously evaluate whether Tavily or Exa retrieves better results for breaking news? A formal benchmark beats the vibe check.
#2208: Building Memory for AI Characters That Actually Evolve
How do AI hosts develop real consistency across episodes? Corn and Herman explore retrieval-augmented memory systems that let AI characters genuine...
#2204: Memory Without RAG: The Real Architecture
mem0, Letta, Zep, and LangMem solve agent memory differently than RAG. Here's what's actually happening under the hood.
#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution
MCPs can be pure knowledge providers with zero tools. Here's why that matters for agents querying government data and authoritative sources.
#2181: When RAG Becomes an Agent
RAG in chatbots is simple retrieval. RAG in agents is a multi-step decision loop. Here's what actually changes.
#2133: Engineering Geopolitical Personas: Beyond Caricatures
How to build LLMs that simulate state actors with strategic fidelity, not just surface mimicry.
#2129: Building the Anti-Hallucination Stack
Stop hoping your AI doesn't lie. We explore the shift to deterministic guardrails, specialized judge models, and the tools making agents reliable.
#2125: Why Agentic Chunking Beats One-Shot Generation
A single prompt can't write a 30-minute script. Here’s the agentic chunking method that fixes coherence.
#2069: Agentskills.io Spec: From Broken YAML to Production Skills
Stop guessing at the agentskills.io spec. Learn the exact YAML fields, directory structure, and authoring patterns to make Claude Code skills that ...
#2057: How Agents Break Through the LLM Output Ceiling
The output window is the new bottleneck: why massive context doesn't solve long-form generation.
#2026: Prompt Layering: Beyond the Monolithic Prompt
Stop writing giant, monolithic prompts. Learn how to stack modular layers for cleaner, more powerful AI applications.
#2022: OpenClaw: The 16 Trillion Token Autonomy Engine
We dug into a repo of 47 real-world projects showing how OpenClaw powers everything from self-healing servers to overnight app builders.
#2010: Building Better AI Memory Systems
We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.
#2008: Needle-in-a-Haystack Testing for LLMs
New AI models claim to be genius-level, but can they actually find a specific fact in a massive document?
#2005: Why Your GPU Changes LLM Output
Running the same LLM on different GPUs can produce different results. Here’s why that happens and how to test for it.
#1994: Why Can't AI Admit When It's Guessing?
Enterprise AI now auto-filters low-confidence claims, but do these self-reported scores actually mean anything?
#1959: How Constrained AI Models Handle the Unexpected
Your AI assistant promised to only use your documents. Instead, it invented a case law that doesn't exist. Here's why.
#1956: AI Skills: From Vibe Coding to Procedural Playbooks
Forget messy system prompts. Agent skills turn AI into a Swiss Army knife of modular, auditable procedures.
#1951: Moltbook: A Social Network for AI Agents
Explore Moltbook, a social network where AI agents interact with persistent identities and goals, reshaping digital communication.
#1918: MCP Schema Stability: Keeping Agents Reliable
When a third-party MCP server updates its schema, your AI agents can crash. Here's how to build resilient clients that self-heal.
#1914: Google Invented RAG's Secret Sauce
Before LLMs, Google solved the "hallucination" problem with a two-stage trick that's making a huge comeback.
#1907: Why We Still Fine-Tune in 2026
Despite million-token context windows, fine-tuning remains essential. Here’s why behavior, not just facts, matters.
#1838: Tuning Search Without Losing Your Mind
Modern search bars are AI decision engines. Here's how small teams can tune fuzzy matching, semantic search, and reranking without breaking everyth...
#1817: Beyond LLMs: The Hidden World of Specialized AI
Explore the vast ecosystem of niche AI models for computer vision and document understanding, far beyond large language models.
#1812: AI Just Got a Library Card to Ancient Jewish Texts
Sefaria's new MCP server connects AI directly to 2,700 years of Jewish texts, transforming how scholars and curious learners study ancient literature.
#1804: Why Does Your Agent Check Old Receipts First?
Stop your AI agent from overthinking. Learn why it checks old memories instead of booking flights—and how to fix the "eagerness" problem.
#1794: RAG Is Cheaper Than You Think (Until It’s Not)
From a $1 embedding bill to a $10k/month vector database bill, here’s the real math behind RAG in 2026.
#1792: Google's Native Multimodal Embedding Kills the Fusion Layer
Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.
#1784: Context1: The Retrieval Coprocessor
Chroma's new 20B model acts as a specialized "scout" for your LLM, replacing slow, static RAG with multi-step, agentic search.
#1778: Audio Is the New "Read Later" Graveyard
Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.
#1765: The Agentic Internet: A Clean Web for Machines
We explore the tools building a parallel, machine-readable web—from SearXNG to Tavily.
#1764: Vector Databases as a Single File
How to give AI agents instant memory of your entire project—without cloud costs or complex infrastructure.
#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness
Explore the evolution from local LLMs to modern agentic CLIs, focusing on the "harness" that gives models context, tools, and autonomy.
#1737: Nous Research: The Decentralized AI Lab Beating Giants
Meet Nous Research, the decentralized collective outperforming billion-dollar labs with open-source AI and the self-improving Hermes-Agent framework.
#1731: Why Deep Research Agents Are Being Forgotten
Specialized research agents outperform general orchestrators by 40-60% on verification tasks, yet developer hype is fading. Here's why.
#1728: How Two AIs Collaborate Without Code
CAMEL AI lets two agents role-play to solve tasks autonomously. No complex code—just emergent teamwork.
#1727: LSP: The Universal AI Coding Interface
Explore how the Language Server Protocol is being repurposed to integrate AI directly into code editors, unifying development workflows.
#1725: Orchestrating AI Swarms: The New Infrastructure
Forget chatbots: AI orchestration is now the key to scaling intelligent agents in the enterprise.
#1713: Why Native AI Search Grounding Still Fails
Native search grounding is expensive and flaky. Here’s why bolt-on tools still win for accurate, real-time AI answers.
#1708: Why Your AI Agent Forgets Everything (And How to Fix It)
Learn how Letta's memory-first architecture solves the AI context bottleneck for long-term agents.
#1700: Can LLMs Learn Continuously Without Forgetting?
We explore a new approach: micro-training updates every few days to keep AI knowledge fresh without constant web searches.
#1666: Multi-Agent AI: One Model, Four Brains
Grok 4.20’s native multi-agent architecture cuts token costs by 75% and enables real-time cross-agent reasoning.
#1629: Why Your AI Agent Needs Loops: A Deep Dive into LangGraph
Stop building linear chains and start building cycles to create agents that can reason, self-correct, and maintain complex state.
#1601: Cohere: The Switzerland of Enterprise AI
While others chase viral memes, Cohere is quietly building the secure, cloud-agnostic infrastructure powering the global enterprise.
#1592: Mastering Embedding Models: From Gemini 2 to Vector Debt
Stop treating embedding models like plumbing. Learn how to navigate vector debt, multimodal retrieval, and database configuration for RAG.
#1565: Machine-Readable Safety: Markdown for AI Agents
Transform bloated government data into clean Markdown to power life-saving AI agents during emergencies.
#1482: The Multimodal Shift: Navigating the New Vector Landscape
From Matryoshka models to multimodal search, discover how the fundamental units of AI memory are being optimized for efficiency and scale.
#1212: The Postgres Vector Revolution: Killing the Sprawl
Is your tech stack a sprawling suburb of microservices? Discover why a 40-year-old database is winning the AI infrastructure war.
#1123: One Database to Rule Them All: The Future of Postgres
Can Postgres 18 finally replace the data warehouse? We dive into data gravity, columnar storage, and the physics of scaling in the AI age.
#1103: LLM Context Windows and the Great Kitchen War
Explore the mechanics of LLM context windows and attention, and witness what happens when technical debates collide with household chores.
#1100: The Truth Conflict: Why AI Ignores the Facts You Give It
Discover why AI models ignore provided documents in favor of old training data and how to build a reliable "hierarchy of truth" for RAG systems.
#995: AI vs. Mach 13: Demystifying the Iranian Missile Threat
How can AI transform dense government reports into actionable intelligence? Explore the physics of Iranian missiles and the future of OSINT.
#959: The Infinite Content Problem: AI’s War on Truth
Explore how AI is scaling disinformation to an industrial level and what the "liar's dividend" means for the future of shared reality.
#948: Can AI Search Survive the Fog of War and SEO Spam?
Explore how AI is moving from static models to real-time data and whether specialized search tools can survive the rise of the tech giants.
#869: Why Tiny Digital Savants Are Outperforming God-Models
Are massive AI models hitting a wall? Discover why the future belongs to lean, domain-specific "digital savants" and vertical pre-training.
#846: Beyond the Vector: Building Long-Standing AI Memory
Stop relying on basic vector search. Discover how Graph RAG and RAPTOR are creating AI systems with true long-standing memory.
#810: The Agentic Interview: How AI Learns to Know You
Stop dumping data. Discover how agentic interviews are transforming AI from a passive listener into a proactive, structured partner.
#809: Beyond the Prompt: The Shift to AI Context Engineering
Is prompt engineering still magic, or just plumbing? Explore why the field is shifting toward context engineering and systematic evaluation.
#755: Inside the Engine: Scaling an Automated AI Podcast
Peek under the hood of My Weird Prompts to see how Gemini, Modal, and multi-agent systems are scaling this automated show to the next level.
#752: Will AI Kill the Click? Why Search Is Becoming Invisible
Stop shouting nouns at a screen. Discover how AI is turning the "ten blue links" into a conversational assistant that understands your intent.
#665: Inside the Stack: The Hidden Layers of Every AI Prompt
Ever wonder what happens after you hit enter? Discover the hidden "stack" of instructions and memories shaping every AI response.
#539: The AI Pipeline: Scaling Curiosity and Community
Herman and Corn discuss turning 500+ episodes into an interactive knowledge base while scaling human-AI collaboration to new heights.
#171: The Rise of AIO: Optimizing Your Website for AI Bots
Stop fighting the crawlers and start feeding them. Learn how llms.txt and structured metadata are defining the new era of AI Optimization.
#144: AI Memory vs. RAG: Building Long-Term Intelligence
Explore why AI needs a "diary" and not just a "library" as we dive into the architectural differences between RAG and long-term agentic memory.
#117: From Keywords to Vectors: How AI Decodes Meaning
Why can AI write poetry but struggle to find a file? Explore the history and math of semantic understanding with Herman and Corn.
#85: Why AI Lies: The Science of Digital Hallucinations
Why do smart AI systems make up fake facts? Corn and Herman explore the "feature" of digital hallucinations and how to spot them.
#30: RAG vs. Memory: Architecting AI's Essential Toolbox
RAG vs. Memory: Are you building resilient AI? Discover the crucial difference between these two foundational pillars.