← All Tags

#rag

80 episodes

#3120: What Makes Agentic Search Tools Like Exa Actually Work?

Why swapping Google for Exa transformed our show's accuracy — and what agentic search does differently.

ai-agentsragsearch

#2705: Your Brain Isn't a Hard Drive — What Actually Fits

Long-term memory isn't storage — it's a generative model. Here's where the brain/computer analogy actually holds up.

neuroscienceraggenerative-ai

#2682: Live Retrieval vs. RAG: What an Agent Actually Does

Does every AI conversation create a tiny vector store? We unpack the real tradeoffs between live document fetching and pre-indexed RAG.

ragai-agentsvector-databases

#2676: Vector Database Schema Design for AI Memory Layers

Stop dumping vectors blindly. Design metadata schemas and namespaces for retrieval that actually works at scale.

vector-databasesragai-memory

#2673: The Embedding Coupling Problem: Editing Vector Stores

Can you edit or delete individual chunks in Pinecone? And can you actually back up a vector index? Yes—but with critical caveats.

vector-databasesragai-agents

#2664: Can You Trust an LLM's Raw Knowledge?

Why pre-trained knowledge isn't reliable for facts — and what actually makes models useful.

large-language-modelsfine-tuningrag

#2639: The Hidden Layer That Makes Search Work

Why your search results miss the mark — and how cross-encoders fix it.

ragsearchinformation-retrieval

#2638: How to Build Disposable AI Agents at Runtime

Create ephemeral AI agents that answer questions about specific items, then vanish. No persistent configuration needed.

ai-agentscontext-windowrag

#2469: Embedding Model Deprecation: RAG's Silent Killer

When OpenAI retires an embedding model, your RAG pipeline breaks silently. Here’s how to fix it.

ragmodel-context-protocolvector-databases

#2466: The Hidden Trap of Embedding Model Lock-In

What happens when your vector database works great — until your embedding model gets deprecated and your vectors become useless.

ragopen-sourceembedding-models

#2315: How to Update AI Models Without Starting Over

Exploring the challenge of updating AI models with new knowledge without costly full retraining.

ai-trainingfine-tuningrag

#2228: Tuning RAG: When Retrieval Helps vs. Hurts

How do you prevent retrieval from suppressing a model's reasoning? We diagnose our own pipeline's four control levers and multi-source fusion strat...

ragai-agentsprompt-engineering

#2214: The Three Failure Modes of AI News Systems

When a conflict changes hourly, AI systems built for yesterday's information fail. Here's how to architect pipelines that actually keep up.

large-language-modelsai-inferencerag

#2213: When Ground Truth Moves Hourly

How do you rigorously evaluate whether Tavily or Exa retrieves better results for breaking news? A formal benchmark beats the vibe check.

ragbenchmarkshallucinations

#2208: Building Memory for AI Characters That Actually Evolve

How do AI hosts develop real consistency across episodes? Corn and Herman explore retrieval-augmented memory systems that let AI characters genuine...

ai-memoryragconversational-ai

#2204: Memory Without RAG: The Real Architecture

mem0, Letta, Zep, and LangMem solve agent memory differently than RAG. Here's what's actually happening under the hood.

ai-agentsai-memoryrag

#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution

MCPs can be pure knowledge providers with zero tools. Here's why that matters for agents querying government data and authoritative sources.

model-context-protocolknowledge-graphsrag

#2181: When RAG Becomes an Agent

RAG in chatbots is simple retrieval. RAG in agents is a multi-step decision loop. Here's what actually changes.

ragai-agentsai-orchestration

#2133: Engineering Geopolitical Personas: Beyond Caricatures

How to build LLMs that simulate state actors with strategic fidelity, not just surface mimicry.

ai-agentsprompt-engineeringrag

#2129: Shifting Left on Hallucinations

Stop hoping your AI doesn't lie. We explore the shift to deterministic guardrails, specialized judge models, and the tools making agents reliable.

ai-agentshallucinationsrag

#2125: Why Agentic Chunking Beats One-Shot Generation

A single prompt can't write a 30-minute script. Here’s the agentic chunking method that fixes coherence.

ai-agentsprompt-engineeringrag

#2069: The Vibe Coding Trap: Why Your Agent Skills Keep Breaking

Stop guessing at the agentskills.io spec. Learn the exact YAML fields, directory structure, and authoring patterns to make Claude Code skills that ...

ai-agentsprompt-engineeringrag

#2057: How Agents Break Through the LLM Output Ceiling

The output window is the new bottleneck: why massive context doesn't solve long-form generation.

ai-agentscontext-windowrag

#2026: Prompt Layering: Beyond the Monolithic Prompt

Stop writing giant, monolithic prompts. Learn how to stack modular layers for cleaner, more powerful AI applications.

prompt-engineeringai-agentsrag

#2022: When AI Becomes Your IT Department

We dug into a repo of 47 real-world projects showing how OpenClaw powers everything from self-healing servers to overnight app builders.

ai-agentsragai-inference

#2010: Building Better AI Memory Systems

We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.

ai-agentsragdata-storage

#2008: Needle-in-a-Haystack Testing for LLMs

New AI models claim to be genius-level, but can they actually find a specific fact in a massive document?

ragai-agentsopen-source

#2005: Beyond Vibes: The Hard Science of LLM Evaluation

Running the same LLM on different GPUs can produce different results. Here’s why that happens and how to test for it.

llm-as-a-judgeragcontext-window

#1994: Why Can't AI Admit When It's Guessing?

Enterprise AI now auto-filters low-confidence claims, but do these self-reported scores actually mean anything?

ai-agentsai-safetyrag

#1959: How Constrained AI Models Handle the Unexpected

Your AI assistant promised to only use your documents. Instead, it invented a case law that doesn't exist. Here's why.

ai-agentsraghallucinations

#1956: AI Skills: From Vibe Coding to Procedural Playbooks

Forget messy system prompts. Agent skills turn AI into a Swiss Army knife of modular, auditable procedures.

ai-agentsprompt-engineeringrag

#1951: The Digital Ant Farm: Watching AI Agents Build Their Own Society

Explore Moltbook, a social network where AI agents interact with persistent identities and goals, reshaping digital communication.

ai-agentsragdecentralized-storage

#1918: When Server Updates Break Your AI Agents

When a third-party MCP server updates its schema, your AI agents can crash. Here's how to build resilient clients that self-heal.

ai-agentsragdistributed-systems

#1914: Google Invented RAG's Secret Sauce

Before LLMs, Google solved the "hallucination" problem with a two-stage trick that's making a huge comeback.

raghallucinationsre-ranking

#1907: Why We Still Fine-Tune in 2026

Despite million-token context windows, fine-tuning remains essential. Here’s why behavior, not just facts, matters.

fine-tuningai-agentsrag

#1838: Tuning Search Without Losing Your Mind

Modern search bars are AI decision engines. Here's how small teams can tune fuzzy matching, semantic search, and reranking without breaking everyth...

ragvector-databasesai-reasoning

#1817: The Hidden Taxonomy of AI: Why Specialized Models Outperform Giants

Explore the vast ecosystem of niche AI models for computer vision and document understanding, far beyond large language models.

computer-visionragai-models

#1812: When AI Gets a Truth Tether to the Talmud

Sefaria's new MCP server connects AI directly to 2,700 years of Jewish texts, transforming how scholars and curious learners study ancient literature.

large-language-modelsmodel-context-protocolrag

#1804: The Fork in the Road: Why AI Agents Check Old Receipts First

Stop your AI agent from overthinking. Learn why it checks old memories instead of booking flights—and how to fix the "eagerness" problem.

ai-agentsprompt-engineeringrag

#1794: RAG Is Cheaper Than You Think (Until It’s Not)

From a $1 embedding bill to a $10k/month vector database bill, here’s the real math behind RAG in 2026.

ragvector-databasescloud-computing

#1792: Google's Native Multimodal Embedding Kills the Fusion Layer

Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.

multimodal-airagai-models

#1784: Context1: The Retrieval Coprocessor

Chroma's new 20B model acts as a specialized "scout" for your LLM, replacing slow, static RAG with multi-step, agentic search.

ragai-agentslatency

#1778: Audio Is the New "Read Later" Graveyard

Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.

audio-processingserverless-gpurag

#1765: The Agentic Internet: A Clean Web for Machines

We explore the tools building a parallel, machine-readable web—from SearXNG to Tavily.

ai-agentsragopen-source

#1764: Your Repo as a Knowledge Base

How to give AI agents instant memory of your entire project—without cloud costs or complex infrastructure.

vector-databasesraglocal-ai

#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness

Explore the evolution from local LLMs to modern agentic CLIs, focusing on the "harness" that gives models context, tools, and autonomy.

local-aiai-agentsrag

#1737: Nous Research: The Decentralized AI Lab Beating Giants

Meet Nous Research, the decentralized collective outperforming billion-dollar labs with open-source AI and the self-improving Hermes-Agent framework.

open-source-aiai-agentsrag

#1731: Why Deep Research Agents Are Being Forgotten

Specialized research agents outperform general orchestrators by 40-60% on verification tasks, yet developer hype is fading. Here's why.

ai-agentsragmodel-context-protocol

#1728: The AI Carpool: Emergent Collaboration Through Role-Playing

CAMEL AI lets two agents role-play to solve tasks autonomously. No complex code—just emergent teamwork.

ai-agentsprompt-engineeringrag

#1727: The Great Architectural Heist: LSP as AI's Universal Plumbing

Explore how the Language Server Protocol is being repurposed to integrate AI directly into code editors, unifying development workflows.

ai-agentssoftware-developmentrag

#1725: The Death of the Lonely Chatbot

Forget chatbots: AI orchestration is now the key to scaling intelligent agents in the enterprise.

ai-agentsdistributed-systemsrag

#1713: Why Native AI Search Grounding Still Fails

Native search grounding is expensive and flaky. Here’s why bolt-on tools still win for accurate, real-time AI answers.

ragai-agentslocal-ai

#1708: Why Your AI Agent Forgets Everything (And How to Fix It)

Learn how Letta's memory-first architecture solves the AI context bottleneck for long-term agents.

ai-agentsragcontext-window

#1700: Can LLMs Learn Continuously Without Forgetting?

We explore a new approach: micro-training updates every few days to keep AI knowledge fresh without constant web searches.

ragfine-tuningai-agents

#1666: The Agent Mesh: Shared Context That Changes Everything

Grok 4.20’s native multi-agent architecture cuts token costs by 75% and enables real-time cross-agent reasoning.

ai-agentstransformersrag

#1629: From DAGs to Loops: Why Agents Need Stateful Cycles

Stop building linear chains and start building cycles to create agents that can reason, self-correct, and maintain complex state.

ai-agentsragcontext-window

#1601: Cohere: The Switzerland of Enterprise AI

While others chase viral memes, Cohere is quietly building the secure, cloud-agnostic infrastructure powering the global enterprise.

ragspeech-recognitiondefense-technology

#1592: The Vector Debt Trap: Choosing Embeddings That Last

Stop treating embedding models like plumbing. Learn how to navigate vector debt, multimodal retrieval, and database configuration for RAG.

ragvector-databasesmultimodal-ai

#1565: Machine-Readable Safety: Markdown for AI Agents

Transform bloated government data into clean Markdown to power life-saving AI agents during emergencies.

ai-agentsragemergency-preparedness

#1482: The Hidden Cost of Choosing an Embedding Model

From Matryoshka models to multimodal search, discover how the fundamental units of AI memory are being optimized for efficiency and scale.

multimodal-aivector-databasesrag

#1212: The Postgres Vector Revolution: Killing the Sprawl

Is your tech stack a sprawling suburb of microservices? Discover why a 40-year-old database is winning the AI infrastructure war.

vector-databasesragarchitecture

#1123: When One Database Isn't Enough

Can Postgres 18 finally replace the data warehouse? We dive into data gravity, columnar storage, and the physics of scaling in the AI age.

architecturevector-databasesrag

#1103: The Kitchen War: When Theory Meets Messy Reality

Explore the mechanics of LLM context windows and attention, and witness what happens when technical debates collide with household chores.

large-language-modelsarchitecturerag

#1100: The Truth Conflict: Why AI Ignores the Facts You Give It

Discover why AI models ignore provided documents in favor of old training data and how to build a reliable "hierarchy of truth" for RAG systems.

raglarge-language-modelsprompt-engineering

#995: Democratizing Intelligence: From PDFs to Policy

How can AI transform dense government reports into actionable intelligence? Explore the physics of Iranian missiles and the future of OSINT.

iranballistic-missilesosintragmissile-defense

#959: The Infinite Content Problem: AI’s War on Truth

Explore how AI is scaling disinformation to an industrial level and what the "liar's dividend" means for the future of shared reality.

ai-agentsragsocial-engineering

#948: Can AI Search Survive the Fog of War and SEO Spam?

Explore how AI is moving from static models to real-time data and whether specialized search tools can survive the rise of the tech giants.

raggenerative-ailatencyanswer-engines

#869: Why Tiny Digital Savants Are Outperforming God-Models

Are massive AI models hitting a wall? Discover why the future belongs to lean, domain-specific "digital savants" and vertical pre-training.

small-language-modelsragfine-tuningai-orchestration2026

#846: Beyond the Vector: Building Long-Standing AI Memory

Stop relying on basic vector search. Discover how Graph RAG and RAPTOR are creating AI systems with true long-standing memory.

ragarchitectureknowledge-graphs

#810: The Agentic Interview: How AI Learns to Know You

Stop dumping data. Discover how agentic interviews are transforming AI from a passive listener into a proactive, structured partner.

ai-agentsragknowledge-graphs

#809: Beyond the Prompt: The Shift to AI Context Engineering

Is prompt engineering still magic, or just plumbing? Explore why the field is shifting toward context engineering and systematic evaluation.

prompt-engineeringarchitecturerag

#755: From Duct Tape to Autonomous Studio: Scaling a 741-Episode AI Podcast

Peek under the hood of My Weird Prompts to see how Gemini, Modal, and multi-agent systems are scaling this automated show to the next level.

ai-agentsarchitecturerag

#752: Will AI Kill the Click? Why Search Is Becoming Invisible

Stop shouting nouns at a screen. Discover how AI is turning the "ten blue links" into a conversational assistant that understands your intent.

raglarge-language-modelsrag

#665: Inside the Stack: The Hidden Layers of Every AI Prompt

Ever wonder what happens after you hit enter? Discover the hidden "stack" of instructions and memories shaping every AI response.

prompt-engineeringragarchitecture

#539: Turning a Podcast into a Searchable Knowledge Base

Herman and Corn discuss turning 500+ episodes into an interactive knowledge base while scaling human-AI collaboration to new heights.

ragai-agentsarchitecture

#171: From Digital Fortresses to Machine-Digestible Sites

Stop fighting the crawlers and start feeding them. Learn how llms.txt and structured metadata are defining the new era of AI Optimization.

aioai-optimizationllmstxtseositemaps

#144: AI Memory vs. RAG: Building Long-Term Intelligence

Explore why AI needs a "diary" and not just a "library" as we dive into the architectural differences between RAG and long-term agentic memory.

ai-memoryragretrieval-augmented-generationvector-databaselong-term-memory

#117: From Keywords to Vectors: How AI Decodes Meaning

Why can AI write poetry but struggle to find a file? Explore the history and math of semantic understanding with Herman and Corn.

large-language-modelsrag

#85: When Probability Beats Truth: Why AI Must Lie

Why do smart AI systems make up fake facts? Corn and Herman explore the "feature" of digital hallucinations and how to spot them.

large-language-modelsragsupply-chain-security

#30: Which AI Tools Will You Still Use Next Year?

RAG vs. Memory: Are you building resilient AI? Discover the crucial difference between these two foundational pillars.

ai-agentsragai-memory