Vectors & Embeddings
Vector databases, RAG, semantic search
35 episodes
#3751: Source-Restricted vs. Open Retrieval: How to Lock Down Your LLM
When should an LLM be locked to specific documents, and when should it search the web? A practical framework for grounding decisions.
#3673: Knowledge Graphs vs SQL: How Custom Relationships Change Retrieval
Why naming relationships (not just connecting data) transforms how you retrieve information.
#2883: Correlation Beyond Pearson: 5 Techniques You Need
Pearson, Spearman, Kendall, partial, distance correlation — when to use each one and why most people stop too soon.
#2810: Every Catalog Is an Argument
From clay spine labels at Ebla to the Pinakes of Alexandria — how organizing knowledge shaped civilization.
#2682: Live Retrieval vs. RAG: What an Agent Actually Does
Does every AI conversation create a tiny vector store? We unpack the real tradeoffs between live document fetching and pre-indexed RAG.
#2676: Vector Database Schema Design for AI Memory Layers
Stop dumping vectors blindly. Design metadata schemas and namespaces for retrieval that actually works at scale.
#2673: The Embedding Coupling Problem: Editing Vector Stores
Can you edit or delete individual chunks in Pinecone? And can you actually back up a vector index? Yes—but with critical caveats.
#2639: The Hidden Layer That Makes Search Work
Why your search results miss the mark — and how cross-encoders fix it.
#2469: Embedding Model Deprecation: RAG's Silent Killer
When OpenAI retires an embedding model, your RAG pipeline breaks silently. Here’s how to fix it.
#2466: The Hidden Trap of Embedding Model Lock-In
What happens when your vector database works great — until your embedding model gets deprecated and your vectors become useless.
#2465: JSON-L vs Parquet: When Each Format Wins
How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?
#2458: Can Graph Databases Go Mainstream?
Graph databases are powerful but niche. Will they ever power mainstream CRMs and ERPs?
#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations
Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...
#2271: Vector Search in a Single File
What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...
#2228: Tuning RAG: When Retrieval Helps vs. Hurts
How do you prevent retrieval from suppressing a model's reasoning? We diagnose our own pipeline's four control levers and multi-source fusion strat...
#2213: When Ground Truth Moves Hourly
How do you rigorously evaluate whether Tavily or Exa retrieves better results for breaking news? A formal benchmark beats the vibe check.
#2206: What Actually Works in AI Memory
Most AI memory systems are just vector databases with similarity search. We break down what mem0, Zep, and Letta are actually doing—and why benchma...
#2181: When RAG Becomes an Agent
RAG in chatbots is simple retrieval. RAG in agents is a multi-step decision loop. Here's what actually changes.
#2139: AI Wargame Memory: Beyond the Context Window
Why simply extending context windows fails in multi-agent simulations, and how layered memory architectures preserve strategic fidelity.
#2010: Building Better AI Memory Systems
We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.
#2008: Needle-in-a-Haystack Testing for LLMs
New AI models claim to be genius-level, but can they actually find a specific fact in a massive document?
#1959: How Constrained AI Models Handle the Unexpected
Your AI assistant promised to only use your documents. Instead, it invented a case law that doesn't exist. Here's why.
#1925: The Plumbing That Keeps Science From Collapsing
Half of all links in academic papers are dead. Here’s the plumbing that keeps knowledge from vanishing.
#1914: Google Invented RAG's Secret Sauce
Before LLMs, Google solved the "hallucination" problem with a two-stage trick that's making a huge comeback.
#1910: Our Podcast Is Now a Permanent Research Artifact
Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.
#1849: When Forum Etiquette Becomes Prompt Engineering
Forget simple chatbots—this is how roleplayers taught AI to remember entire worlds, from 90s MUDs to just-in-time lore delivery.
#1838: Tuning Search Without Losing Your Mind
Modern search bars are AI decision engines. Here's how small teams can tune fuzzy matching, semantic search, and reranking without breaking everyth...
#1834: Owning Your AI Memory: The Data Exit Strategy
Why your AI remembers your coffee order but forgets your son’s name—and how to build a portable, federated memory layer you actually own.
#1794: RAG Is Cheaper Than You Think (Until It’s Not)
From a $1 embedding bill to a $10k/month vector database bill, here’s the real math behind RAG in 2026.
#1792: Google's Native Multimodal Embedding Kills the Fusion Layer
Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.
#1784: Context1: The Retrieval Coprocessor
Chroma's new 20B model acts as a specialized "scout" for your LLM, replacing slow, static RAG with multi-step, agentic search.
#1779: AI Memory Is a Mess: Files, Vectors, or Cloud?
Why your AI forgets your instructions and what the battle over portable memory means for the future of agents.
#1765: The Agentic Internet: A Clean Web for Machines
We explore the tools building a parallel, machine-readable web—from SearXNG to Tavily.
#1764: Your Repo as a Knowledge Base
How to give AI agents instant memory of your entire project—without cloud costs or complex infrastructure.
#1713: Why Native AI Search Grounding Still Fails
Native search grounding is expensive and flaky. Here’s why bolt-on tools still win for accurate, real-time AI answers.