AI Safety & Ethics

Guardrails & Alignment

Safety measures, content filtering, red-teaming

12 episodes

#2250: Where AI Safety Researchers Actually Work

Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...

ai-safetyai-alignmentanthropic

#2246: Constitutional AI: Anthropic's Theory of Safe Scaling

How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...

anthropicai-safetyai-alignment

#2190: Simulating Extreme Decisions With LLMs

LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.

large-language-modelsai-safetyhallucinations

#2186: The AI Persona Fidelity Challenge

Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...

ai-safetyai-alignmenthallucinations

#2068: Is Safety a Filter or a Feature?

External filters vs. baked-in ethics: the architectural war for LLM safety.

ai-safetyai-ethicsai-alignment

#2045: Anonymity Isn't the Problem, The Architecture Is

Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.

digital-privacysocial-engineeringhuman-computer-interaction

#2029: ADHD Brains: Why Willpower Fails & How to Hack It

Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.

adhdneuroscienceexecutive-function

#2015: AI's Watchdogs: Who's Actually Regulating Tech?

As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.

ai-ethicsai-agentsai-safety

#2009: The Plumbing of AI Safety: Guardrails, Not Vibes

We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.

ai-safetylatencyopen-source-ai

#1996: Why Leaders Broadcast Victory While Citizens Hear Sirens

A gap opens between official statements and reality, as curated videos clash with live data streams.

geopolitical-strategynarrative-dissonancepublic-trust

#1803: Why Hostages Defend Their Captors

A tech exec was brainwashed in 2025. The neurochemistry is the same as Stockholm Syndrome.

neurosciencepsychopharmacologysocial-engineering

#1712: Five AIs, One Question: A Tiananmen Square Test

We asked five AI models the same question about Tiananmen Square. Their answers reveal a stark divide between Chinese and Western AI.

ai-ethicsgeopoliticsai-censorship