Inference & Training
Computational aspects, fine-tuning, RLHF
42 episodes
#2196: The Annotation Economy: Who Labels AI's Training Data
Annotation is the invisible foundation of AI—and a $17B industry by 2030. Here's what dataset curators actually need to know about the tools, platf...
#2187: Why Claude Writes Like a Person (and Gemini Doesn't)
Claude produces prose that sounds human. Gemini reads like Wikipedia. The difference isn't capability—it's how they were trained to think about wri...
#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone
Can you build a personalized LLM by skipping traditional fine-tuning and using only post-training alignment methods like DPO and GRPO? We break dow...
#2160: Claude's Latency Profile and SLA Guarantees
Claude is measurably slower than competitors—and Anthropic's SLA promises are even thinner than the latency numbers suggest. What enterprises actua...
#2136: The Brutal Problem of AI Wargame Evaluation
Most AI wargame simulations skip evaluation entirely or rely on token expert reviews. This is the field's biggest credibility problem.
#2135: Is Your AI Wargame Signal or Noise?
Monte Carlo methods promise statistical rigor for AI wargaming, but the line between genuine insight and sampling noise is thinner than you think.
#2129: Building the Anti-Hallucination Stack
Stop hoping your AI doesn't lie. We explore the shift to deterministic guardrails, specialized judge models, and the tools making agents reliable.
#2123: Human Reaction Time vs. AI Latency
We obsess over shaving milliseconds off AI response times, but human biology has a hard limit. Here’s why your brain can’t keep up.
#2115: Why AI Answers Differ Even When You Ask Twice
You ask an AI the same question twice and get two different answers. It’s not a bug—it’s physics.
#2110: Tuning AI Personality: Beyond Sycophancy
AI models swing between obsequious flattery and cold dismissal. Here’s why that happens and how to fix it.
#2089: Why AI Drones Need Millions of Images
A public GitHub model spotted by a listener reveals the massive gap between hobbyist AI and lethal military drone detection systems.
#2065: Why Run One AI When You Can Run Two?
Speculative decoding makes LLMs 2-3x faster with zero quality loss by using a small draft model to guess tokens that a large model verifies in para...
#2063: That $500M Chatbot Is Just a Base Model
That polite chatbot? It started as a raw, chaotic autocomplete engine costing half a billion dollars to build.
#2059: npm Cache and Stale Dependencies in Agentic Pipelines
npx is silently running old versions of your AI tools. Here's why your updates vanish into a cache black hole.
#2026: Prompt Layering: Beyond the Monolithic Prompt
Stop writing giant, monolithic prompts. Learn how to stack modular layers for cleaner, more powerful AI applications.
#2025: How Do You Reward a Thought?
Rewarding an AI agent is harder than just saying "good job"—here's how we turn messy human values into math.
#2021: Your Frozen AI Is Getting Smarter (Here's How)
Your AI model might be static, but the system around it can make it learn in real-time.
#2007: AI Grading AI: The Snake Eating Its Tail
We asked an AI to write this script. Then we asked another AI to grade it. Here’s what happens when the judges have biases.
#2006: How Do You Measure an LLM's "Soul"?
Traditional benchmarks can't measure tone or empathy. Here's how to evaluate if an AI model truly "gets it right."
#2005: Why Your GPU Changes LLM Output
Running the same LLM on different GPUs can produce different results. Here’s why that happens and how to test for it.
#1992: Israel's 4,000-GPU National Supercomputer
Israel is building a sovereign AI supercomputer with 4,000 Nvidia B200 GPUs to keep startups local.
#1985: AI Tutors vs. Human Error: Who Do You Trust?
AI gets flak for hallucinations, but humans misremember 40% of facts. Why the double standard?
#1932: How Do You QA a Probabilistic System?
LLMs break traditional testing. Here’s the 3-pillar toolkit teams use to catch hallucinations and garbage outputs at scale.
#1931: AI Pipelines: In-Memory vs. Durable State
Why do AI pipelines crash? It’s not the models—it’s the plumbing. We break down how to manage data between stages.
#1927: Workers vs. Servers: The 2026 Compute Showdown
Is the persistent server dead? We compare Cloudflare Workers, GitHub Actions, and VPS options for modern app architecture.
#1909: The Unbakeable Cake: AI's Copyright Problem
Why can't we just delete stolen data from AI models? It's not a database—it's a baked cake.
#1907: Why We Still Fine-Tune in 2026
Despite million-token context windows, fine-tuning remains essential. Here’s why behavior, not just facts, matters.
#1894: Engineering Serendipity: Tuning AI for Better Brainstorming
Stop asking chatbots for generic ideas. Learn how to configure AI as a structured, critical partner for business innovation and career pivots.
#1882: The $8B Human Cost of AI Data
AI isn't free—it costs billions for humans to label data. See why annotation is the real engine behind models like Gemini.
#1839: AI's Data Kitchen: From Hoovering to Fine-Tuning
We go behind the curtain of the AI data pipeline, revealing the messy, multi-billion-dollar war over data curation.
#1828: Mastering 2M Token Context in Agentic Pipelines
A massive context window sounds like a dream, but it can quickly become a nightmare for complex AI workflows.
#1824: Why Governments Are Building Bunkers for AI
Public clouds can’t handle the security or scale of classified AI. Governments are retreating to fortified bunkers.
#1822: Quantum in the Cloud: Hype vs. Hardware
Is QCaaS a billion-dollar breakthrough or an expensive science experiment? We explore the gap between hype and hardware.
#1811: Stop Hardcoding User Names in AI Prompts
Three methods for storing user identity in AI agents—and why the "Fat System Prompt" breaks production apps.
#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else
English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.
#1777: Claude Called My Prompt "Rambling" and I'm Not Okay
When an AI coding tool critiques your prompt's literary quality, it raises a massive technical question about engineered personality.
#1762: Testing AI Truthfulness: Beyond Vibes
Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.
#1740: Chatterbox TTS: Open Source vs. ElevenLabs
We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.
#1736: Why OpenClaw Eats 16 Trillion Tokens
OpenClaw is processing 16.5 trillion tokens daily, dwarfing Wikipedia. Here’s why it’s #1.
#1709: Standard Deviation: The Map Without a Scale
Why the average number alone is misleading—and how standard deviation reveals the true story behind the spread.
#1702: Roleplay Models Aren't Just for NSFW—They're Creative Co-Processors
Forget GPT-4 for scripts—specialized roleplay models like Aion-2.0 are better at character consistency and dialogue.
#1700: Can LLMs Learn Continuously Without Forgetting?
We explore a new approach: micro-training updates every few days to keep AI knowledge fresh without constant web searches.