#data-storage
28 episodes
#2571: How S3 Billing Actually Works (And Why R2 Is Different)
Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.
#2475: Docker Volumes: Why They Can't Move and What To Do
Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.
#2465: JSON-L vs Parquet: When Each Format Wins
How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?
#2438: How Object Storage Actually Works Under the Hood
Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.
#2368: How Recommendation Engines Really Work
Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...
#2271: Vector Search in a Single File
What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...
#2064: Why GPT-5 Is Stuck: The Data Wall Explained
The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.
#2011: Saving AI Knowledge Beyond the Chat Window
We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.
#2010: Building Better AI Memory Systems
We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.
#1989: Your Cloud Photos Vanish If You Miss a $5 Bill
Is your data safe in the cloud, or is it one missed payment away from oblivion?
#1988: Will Glass Storage Save Us From the Data Deluge?
Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?
#1983: Why Your Digital Photos Are Slowly Disappearing
Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.
#1920: InfluxDB vs. Postgres: The Time-Series Showdown
We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.
#1910: Our Podcast Is Now a Permanent Research Artifact
Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.
#1797: Why the Cloud Runs on Cassette Tapes
The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.
#1776: The 80,000-Mile Backup Anxiety
Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.
#1475: Why Your Cloud Folders Are a Lie: The S3 Revolution
Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.
#1233: Why "Just Use Postgres" Isn't Always Enough
Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.
#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j
Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.
#1124: The Database Explosion: Why One Size No Longer Fits All
From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.
#1044: Ezra the Scribe: Architect of a Portable Identity
Discover how Ezra the Scribe transformed a nation’s identity from a physical temple to a portable text, shaping the modern world.
#742: The Dark Archive: Saving Extremism for History
When mainstream sites delete toxic content, how do researchers save it? Explore the "memory hole" of digital hate speech and dark archives.
#714: The Billion-Year Backup: Escaping the Digital Dark Age
Will our digital legacy survive for billions of years? Explore the tech fighting the "Digital Dark Age," from lunar libraries to quartz glass.
#620: ZFS Decoded: Recovering Data After Hardware Failure
Your motherboard fried, but is your data safe? Discover the secrets of ZFS portability, forced imports, and professional recovery workflows.
#591: A Petabyte in Your Pocket? The Future of Micro SD Storage
From floppy disks to 4TB cards, how much data can we squeeze onto a fingernail before physics pushes back? Explore the future of storage density.
#589: Beyond Git: Taming the Chaos of AI and Large Media Assets
When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.
#564: Beyond the Factory Reset: How to Truly Erase Your Data
Think a factory reset protects your old data? Herman and Corn reveal why your digital "ghosts" might still be lurking on your old devices.
#409: RAID Demystified: Speed, Safety, and Data Survival
Learn the math behind RAID levels, the risks of drive rebuilds, and why ZFS is the modern gold standard for data integrity.