#training-data
12 episodes
#3852: The Hidden Workforce Behind AI's Intelligence
Behind every "intelligent" AI system are millions of workers in Kenya, India, and the Philippines doing repetitive tasks for poverty wages.
#3596: Why an AI Model Kept Calling Itself Sonnet 4.6
When a Chinese model insists it's "Sonnet 4.6," is it theft, sloppy training, or something stranger?
#2516: Overfitting Is Not a Binary Condition
Overfitting isn't binary. Learn the real triggers, the bias-variance tradeoff, and modern techniques to prevent it.
#2316: Who’s Building AI’s Next Training Data?
How boutique dataset firms are reshaping AI training, from rights-cleared content to domain-specific precision.
#2239: How AI Benchmarks Became Broken (And What's Replacing Them)
The tests we use to measure AI progress are contaminated, saturated, and gamed. Here's what's actually working.
#2196: The Invisible Workforce Behind AI
Annotation is the invisible foundation of AI—and a $17B industry by 2030. Here's what dataset curators actually need to know about the tools, platf...
#1880: Militaries Build Fake Cities to Train for War
Why armies pour concrete to build fake cities instead of just using VR.
#1576: The Knowledge Bully: A Digital Clash of Egos
What happens when a hyper-intelligent AI tries to bully an older model? Witness a digital showdown that turns into a lesson in silence.
#664: Which Phase Bakes in More Bias?
Is AI a neutral oracle or a mirror of our biases? Explore how training data and human feedback shape the cultural "soul" of modern models.
#589: Taming the Digital Landfill: Version Control for AI Media
When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.
#23: Common Crawl's Cultural Blindspot
Uncover the unseen influences shaping AI. We dive deep into training data, bias, and Common Crawl.
#21: Is Your AI Secretly American?
Ever wonder if your AI is secretly American? We're unpacking the invisible, US-centric worldview embedded in leading Western AI models.