AI Core

Inference & Training

Computational aspects, fine-tuning, RLHF

42 episodes

#2196: The Annotation Economy: Who Labels AI's Training Data

Annotation is the invisible foundation of AI—and a $17B industry by 2030. Here's what dataset curators actually need to know about the tools, platf...

training-dataai-trainingfine-tuning

#2187: Why Claude Writes Like a Person (and Gemini Doesn't)

Claude produces prose that sounds human. Gemini reads like Wikipedia. The difference isn't capability—it's how they were trained to think about wri...

large-language-modelsfine-tuningai-training

#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone

Can you build a personalized LLM by skipping traditional fine-tuning and using only post-training alignment methods like DPO and GRPO? We break dow...

fine-tuningai-alignmentgpu-acceleration

#2160: Claude's Latency Profile and SLA Guarantees

Claude is measurably slower than competitors—and Anthropic's SLA promises are even thinner than the latency numbers suggest. What enterprises actua...

latencyai-inferenceanthropic

#2136: The Brutal Problem of AI Wargame Evaluation

Most AI wargame simulations skip evaluation entirely or rely on token expert reviews. This is the field's biggest credibility problem.

ai-safetymilitary-strategyai-agents

#2135: Is Your AI Wargame Signal or Noise?

Monte Carlo methods promise statistical rigor for AI wargaming, but the line between genuine insight and sampling noise is thinner than you think.

ai-agentsmilitary-strategyai-safety

#2129: Building the Anti-Hallucination Stack

Stop hoping your AI doesn't lie. We explore the shift to deterministic guardrails, specialized judge models, and the tools making agents reliable.

ai-agentshallucinationsrag

#2123: Human Reaction Time vs. AI Latency

We obsess over shaving milliseconds off AI response times, but human biology has a hard limit. Here’s why your brain can’t keep up.

human-computer-interactionai-inferencelatency

#2115: Why AI Answers Differ Even When You Ask Twice

You ask an AI the same question twice and get two different answers. It’s not a bug—it’s physics.

ai-inferencegpu-accelerationai-non-determinism

#2110: Tuning AI Personality: Beyond Sycophancy

AI models swing between obsequious flattery and cold dismissal. Here’s why that happens and how to fix it.

ai-agentsprompt-engineeringai-ethics

#2089: Why AI Drones Need Millions of Images

A public GitHub model spotted by a listener reveals the massive gap between hobbyist AI and lethal military drone detection systems.

computer-visionmilitary-strategyai-agents

#2065: Why Run One AI When You Can Run Two?

Speculative decoding makes LLMs 2-3x faster with zero quality loss by using a small draft model to guess tokens that a large model verifies in para...

latencygpu-accelerationai-inference

#2063: That $500M Chatbot Is Just a Base Model

That polite chatbot? It started as a raw, chaotic autocomplete engine costing half a billion dollars to build.

large-language-modelsgpu-accelerationai-training

#2059: npm Cache and Stale Dependencies in Agentic Pipelines

npx is silently running old versions of your AI tools. Here's why your updates vanish into a cache black hole.

ai-agentscybersecuritysoftware-development

#2026: Prompt Layering: Beyond the Monolithic Prompt

Stop writing giant, monolithic prompts. Learn how to stack modular layers for cleaner, more powerful AI applications.

prompt-engineeringai-agentsrag

#2025: How Do You Reward a Thought?

Rewarding an AI agent is harder than just saying "good job"—here's how we turn messy human values into math.

ai-agentsai-ethicsai-safety

#2021: Your Frozen AI Is Getting Smarter (Here's How)

Your AI model might be static, but the system around it can make it learn in real-time.

ai-agentsmodel-context-protocolai-safety

#2007: AI Grading AI: The Snake Eating Its Tail

We asked an AI to write this script. Then we asked another AI to grade it. Here’s what happens when the judges have biases.

llm-as-a-judgehallucinationsai-ethics

#2006: How Do You Measure an LLM's "Soul"?

Traditional benchmarks can't measure tone or empathy. Here's how to evaluate if an AI model truly "gets it right."

llm-as-a-judgeai-ethicsai-safety

#2005: Why Your GPU Changes LLM Output

Running the same LLM on different GPUs can produce different results. Here’s why that happens and how to test for it.

llm-as-a-judgeragcontext-window

#1992: Israel's 4,000-GPU National Supercomputer

Israel is building a sovereign AI supercomputer with 4,000 Nvidia B200 GPUs to keep startups local.

gpu-accelerationnational-securityinfrastructure

#1985: AI Tutors vs. Human Error: Who Do You Trust?

AI gets flak for hallucinations, but humans misremember 40% of facts. Why the double standard?

ai-agentsai-safetyreliability

#1932: How Do You QA a Probabilistic System?

LLMs break traditional testing. Here’s the 3-pillar toolkit teams use to catch hallucinations and garbage outputs at scale.

ai-agentsai-safetyhallucinations

#1931: AI Pipelines: In-Memory vs. Durable State

Why do AI pipelines crash? It’s not the models—it’s the plumbing. We break down how to manage data between stages.

distributed-systemsdata-redundancyhigh-availability

#1927: Workers vs. Servers: The 2026 Compute Showdown

Is the persistent server dead? We compare Cloudflare Workers, GitHub Actions, and VPS options for modern app architecture.

edge-computingserverless-gpulatency

#1909: The Unbakeable Cake: AI's Copyright Problem

Why can't we just delete stolen data from AI models? It's not a database—it's a baked cake.

ai-ethicsprivacygenerative-ai

#1907: Why We Still Fine-Tune in 2026

Despite million-token context windows, fine-tuning remains essential. Here’s why behavior, not just facts, matters.

fine-tuningai-agentsrag

#1894: Engineering Serendipity: Tuning AI for Better Brainstorming

Stop asking chatbots for generic ideas. Learn how to configure AI as a structured, critical partner for business innovation and career pivots.

ai-agentsprompt-engineeringai-reasoning

#1882: The $8B Human Cost of AI Data

AI isn't free—it costs billions for humans to label data. See why annotation is the real engine behind models like Gemini.

ai-trainingdata-integritysupply-chain

#1839: AI's Data Kitchen: From Hoovering to Fine-Tuning

We go behind the curtain of the AI data pipeline, revealing the messy, multi-billion-dollar war over data curation.

large-language-modelsfine-tuningdata-integrity

#1828: Mastering 2M Token Context in Agentic Pipelines

A massive context window sounds like a dream, but it can quickly become a nightmare for complex AI workflows.

context-windowai-agentsprompt-engineering

#1824: Why Governments Are Building Bunkers for AI

Public clouds can’t handle the security or scale of classified AI. Governments are retreating to fortified bunkers.

national-securitycybersecuritydata-security

#1822: Quantum in the Cloud: Hype vs. Hardware

Is QCaaS a billion-dollar breakthrough or an expensive science experiment? We explore the gap between hype and hardware.

cloud-computinghigh-performance-computinghardware-reliability

#1811: Stop Hardcoding User Names in AI Prompts

Three methods for storing user identity in AI agents—and why the "Fat System Prompt" breaks production apps.

ai-agentscontext-windowlatency

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.

text-to-speechlinguisticsdata-integrity

#1777: Claude Called My Prompt "Rambling" and I'm Not Okay

When an AI coding tool critiques your prompt's literary quality, it raises a massive technical question about engineered personality.

prompt-engineeringai-agentsai-ethics

#1762: Testing AI Truthfulness: Beyond Vibes

Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.

ai-safetyhallucinationsprompt-engineering

#1740: Chatterbox TTS: Open Source vs. ElevenLabs

We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.

text-to-speechopen-sourceprosody-control

#1736: Why OpenClaw Eats 16 Trillion Tokens

OpenClaw is processing 16.5 trillion tokens daily, dwarfing Wikipedia. Here’s why it’s #1.

ai-agentstokenizationopen-source-ai

#1709: Standard Deviation: The Map Without a Scale

Why the average number alone is misleading—and how standard deviation reveals the true story behind the spread.

missile-defenselogisticsstandard-deviation

#1702: Roleplay Models Aren't Just for NSFW—They're Creative Co-Processors

Forget GPT-4 for scripts—specialized roleplay models like Aion-2.0 are better at character consistency and dialogue.

fine-tuninggenerative-aiai-agents

#1700: Can LLMs Learn Continuously Without Forgetting?

We explore a new approach: micro-training updates every few days to keep AI knowledge fresh without constant web searches.

ragfine-tuningai-agents