Guardrails & Alignment
Safety measures, content filtering, red-teaming
12 episodes
#2250: Where AI Safety Researchers Actually Work
Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...
#2246: Constitutional AI: Anthropic's Theory of Safe Scaling
How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...
#2190: Simulating Extreme Decisions With LLMs
LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.
#2186: The AI Persona Fidelity Challenge
Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...
#2068: Is Safety a Filter or a Feature?
External filters vs. baked-in ethics: the architectural war for LLM safety.
#2045: Anonymity Isn't the Problem, The Architecture Is
Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.
#2029: ADHD Brains: Why Willpower Fails & How to Hack It
Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.
#2015: AI's Watchdogs: Who's Actually Regulating Tech?
As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.
#2009: The Plumbing of AI Safety: Guardrails, Not Vibes
We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.
#1996: Why Leaders Broadcast Victory While Citizens Hear Sirens
A gap opens between official statements and reality, as curated videos clash with live data streams.
#1803: Why Hostages Defend Their Captors
A tech exec was brainwashed in 2025. The neurochemistry is the same as Stockholm Syndrome.
#1712: Five AIs, One Question: A Tiananmen Square Test
We asked five AI models the same question about Tiananmen Square. Their answers reveal a stark divide between Chinese and Western AI.