Guardrails & Alignment
Safety measures, content filtering, red-teaming
26 episodes
#3724: How the Pope's Letter on AI Actually Works
Unpacking the Pope’s new encyclical on AI: what it is, how Catholics interpret it, and why it matters beyond the Church.
#3658: How Reddit Built Guardrails for Anonymity
Reddit didn't solve harassment by killing anonymity. It built friction, reputation systems, and distributed governance.
#3422: How Rival Labs Reverse-Engineer a New AI Model in Hours
Inside the organized frenzy when a closed-source model drops — and how competitors map its every weakness.
#3209: When Algorithms Become Censors
How SLAPP suits, libel tourism, and Google's algorithm chill journalism more effectively than any law.
#2909: The Reassurance Mirage: When Moderation Fails
How the EU Digital Services Act exposes a 30-to-1 gap in appeal success rates between platforms.
#2808: Falling for Your Chatbot: Love, Loss, and Language Models
Real cases of people falling in love with AI companions, why memory makes it feel real, and what happens when the illusion breaks.
#2558: Should You Say Please to AI?
The surprising cost, technical tradeoffs, and ethical dilemmas of saying "please" to chatbots.
#2526: How Peer Review Actually Works (and Fails)
The history of peer review, the Lancet's biggest scandals, and why arXiv is changing everything.
#2518: How Jailbreaking Reveals AI's Hidden Tension
What the DAN prompt and grandma exploits reveal about the structural conflict inside every LLM.
#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering
PII detection at the gateway layer can block legitimate invoices. Here's how guardrails actually work and where they fail.
#2413: When Your AI Says No to Everything
Why LLMs refuse 73% of harmless prompts — and the trade-off between safety and usefulness.
#2412: When AI Caves: Progressive vs. Regressive Sycophancy
Why do LLMs agree with you even when you're wrong? We break down the SycEval benchmark and the 78% persistence problem.
#2410: How Researchers Actually Measure Censorship in Chinese LLMs
Beyond headlines: the actual benchmarks, methodologies, and pitfalls in detecting political refusal in Chinese language models.
#2407: Three Landings in 90 Days: Pilot Automation Dependency
Why pilots aren't hand-flying enough, the regulatory floor that lets it happen, and what airlines are doing about it.
#2250: How Incentives Shape AI Safety Research
Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...
#2246: Constitutional AI: Anthropic's Theory of Safe Scaling
How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...
#2190: Simulating Extreme Decisions With LLMs
LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.
#2186: The AI Persona Fidelity Challenge
Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...
#2068: Is Safety a Filter or a Feature?
External filters vs. baked-in ethics: the architectural war for LLM safety.
#2045: Anonymity Isn't the Problem, The Architecture Is
Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.
#2029: ADHD Brains: Why Willpower Fails & How to Hack It
Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.
#2015: The Think Tanks Writing AI's Rulebook
As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.
#2009: The Plumbing of AI Safety: Guardrails, Not Vibes
We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.
#1996: Why Leaders Broadcast Victory While Citizens Hear Sirens
A gap opens between official statements and reality, as curated videos clash with live data streams.
#1803: Why Hostages Defend Their Captors
A tech exec was brainwashed in 2025. The neurochemistry is the same as Stockholm Syndrome.
#1712: Five AIs, One Question: A Tiananmen Square Test
We asked five AI models the same question about Tiananmen Square. Their answers reveal a stark divide between Chinese and Western AI.