AI Safety & Ethics
Security, alignment, and responsible AI
64 episodes · Page 2 of 3
#2482: When AI Chatbots Leak Your PDFs via Public S3 Buckets
A user uploaded a sensitive PDF to an AI chatbot. The chatbot stored it in a public S3 bucket with zero authentication.
#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering
PII detection at the gateway layer can block legitimate invoices. Here's how guardrails actually work and where they fail.
#2430: Where Men's Advocacy Crosses Into Misogyny
How to acknowledge real male grievances without falling into the manosphere's woman-hating fringe.
#2424: What Feminists Actually Mean by "The Patriarchy
Unpacking the structural concept, the popular shorthand, and where the line gets blurry between critiquing systems and demonizing individuals.
#2413: When Your AI Says No to Everything
Why LLMs refuse 73% of harmless prompts — and the trade-off between safety and usefulness.
#2412: When AI Caves: Progressive vs. Regressive Sycophancy
Why do LLMs agree with you even when you're wrong? We break down the SycEval benchmark and the 78% persistence problem.
#2411: Are Political Bias Benchmarks Actually Measuring Anything?
Why the Political Compass Test fails, and what researchers are building instead to actually measure model bias.
#2410: How Researchers Actually Measure Censorship in Chinese LLMs
Beyond headlines: the actual benchmarks, methodologies, and pitfalls in detecting political refusal in Chinese language models.
#2409: When AI Cheats on Cultural Knowledge
Five benchmarks that reveal how AI systems fail at cultural knowledge — and what their methodologies tell us.
#2407: Three Landings in 90 Days: Pilot Automation Dependency
Why pilots aren't hand-flying enough, the regulatory floor that lets it happen, and what airlines are doing about it.
#2383: The Blame Gap: Public Anger vs. Breach Reality
How much blame do companies deserve for data breaches? The answer isn't as simple as you think.
#2372: Choosing the Right Sandbox for Your Threat Model
Explore the tools and methods for creating secure, isolated environments to test malware, browse privately, and protect sensitive systems.
#2250: How Incentives Shape AI Safety Research
Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...
#2246: Constitutional AI: Anthropic's Theory of Safe Scaling
How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...
#2190: Simulating Extreme Decisions With LLMs
LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.
#2186: The AI Persona Fidelity Challenge
Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...
#2180: The Sandboxing Tradeoff in Agent Design
AI agents need broad permissions to be useful—but every permission expands the attack surface. We map the real threat landscape and the isolation t...
#2134: The Fog-of-War Problem in AI Wargaming
Why shared AI brains make secret-keeping a nightmare, and the four architectural patterns researchers use to fix it.
#2102: Why Don't You Notice AI Security Delays?
Multi-layer security checks add latency, but modern CLIs hide it under 100ms using parallelization and speculation.
#2068: Is Safety a Filter or a Feature?
External filters vs. baked-in ethics: the architectural war for LLM safety.
#2045: Anonymity Isn't the Problem, The Architecture Is
Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.
#2029: ADHD Brains: Why Willpower Fails & How to Hack It
Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.
#2015: The Think Tanks Writing AI's Rulebook
As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.
#2009: The Plumbing of AI Safety: Guardrails, Not Vibes
We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.