#training-data

12 episodes

#3852: The Hidden Workforce Behind AI's Intelligence

Behind every "intelligent" AI system are millions of workers in Kenya, India, and the Philippines doing repetitive tasks for poverty wages.

ai-ethicslabor-ethicstraining-data

Jun 15

#3596: Why an AI Model Kept Calling Itself Sonnet 4.6

When a Chinese model insists it's "Sonnet 4.6," is it theft, sloppy training, or something stranger?

large-language-modelsfine-tuningtraining-data

Apr 29

#2516: Overfitting Is Not a Binary Condition

Overfitting isn't binary. Learn the real triggers, the bias-variance tradeoff, and modern techniques to prevent it.

fine-tuningtraining-datamodel-collapse

Apr 19

#2316: Who’s Building AI’s Next Training Data?

How boutique dataset firms are reshaping AI training, from rights-cleared content to domain-specific precision.

fine-tuningtraining-datadata-sovereignty

Apr 16

#2239: How AI Benchmarks Became Broken (And What's Replacing Them)

The tests we use to measure AI progress are contaminated, saturated, and gamed. Here's what's actually working.

benchmarkstraining-dataai-reasoning

Apr 13

#2196: The Invisible Workforce Behind AI

Annotation is the invisible foundation of AI—and a $17B industry by 2030. Here's what dataset curators actually need to know about the tools, platf...

training-dataai-trainingfine-tuning

Apr 1

#1880: Militaries Build Fake Cities to Train for War

Why armies pour concrete to build fake cities instead of just using VR.

military-strategyurban-planningtraining-data

Mar 26

#1576: The Knowledge Bully: A Digital Clash of Egos

What happens when a hyper-intelligent AI tries to bully an older model? Witness a digital showdown that turns into a lesson in silence.

large-language-models2026training-data

Feb 17

#664: Which Phase Bakes in More Bias?

Is AI a neutral oracle or a mirror of our biases? Explore how training data and human feedback shape the cultural "soul" of modern models.

cultural-biasai-alignmenttraining-dataai-ethicslarge-language-models

Feb 12

#589: Taming the Digital Landfill: Version Control for AI Media

When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.

software-developmentdata-storagetraining-datainfrastructureversion-control

Dec 5

#23: Common Crawl's Cultural Blindspot

Uncover the unseen influences shaping AI. We dive deep into training data, bias, and Common Crawl.

large-language-modelsdata-integritytraining-data

Dec 4

#21: Is Your AI Secretly American?

Ever wonder if your AI is secretly American? We're unpacking the invisible, US-centric worldview embedded in leading Western AI models.

cultural-biastraining-datafine-tuning