#1475: Why Your Cloud Folders Are a Lie: The S3 Revolution

Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.

0:000:00

Episode Details

Published: Mar 23
Duration: 19:25
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: cloud-computing data-storage cloud-repatriation

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The familiar hierarchy of folders and subfolders on a desktop is a useful fiction. While users visualize a tidy physical filing cabinet, the reality of large-scale cloud infrastructure is fundamentally different. As Amazon S3 marks its twentieth anniversary in March 2026, the industry is reflecting on how object storage has diverged from the local filesystems that have dominated computing since the 1980s.

The POSIX Standard vs. Object Storage

Most local filesystems follow the POSIX standard, designed for low-latency, random-access operations. In a POSIX environment, a computer can open a file, change a single byte in the middle, and save it instantly. This is ideal for local NVMe drives where latency is measured in microseconds.

Object storage, however, is built on a RESTful API over HTTP. It treats data as "blobs" or discrete units that are fundamentally immutable. If a five-gigabyte file needs a minor update, the entire object must be re-uploaded. This trade-off is the price of massive scale; while a local disk is limited to one machine, object storage now handles over 500 trillion objects globally.

The Myth of the Folder

In a traditional filesystem, an "inode table" acts as a map to find data blocks on a disk. As the number of files grows, searching this table becomes a bottleneck. Object storage solves this by using a flat namespace. There are no actual folders in a system like S3; there are only "Keys."

What looks like a file path—such as "images/2026/photo.jpg"—is actually just a long string. The forward slashes are not directory separators but merely characters in the name. When a user "opens a folder" in a cloud console, the system is actually performing a database query for all keys starting with a specific prefix. This architectural difference means that common tasks, like deleting a folder, require individual delete requests for every single object within that prefix.

Durability and the "API Tax"

The primary advantage of object storage is its extreme durability, often cited at "eleven nines." To achieve this, data is replicated across multiple geographic zones. While this protects against hardware failure or regional disasters, it introduces significant latency compared to local bus speeds.

In 2026, the conversation around cloud storage has shifted from the cost of space to the cost of access. While egress fees have largely vanished, "API taxes"—the fees charged for every LIST, PUT, or GET request—have become a dominant expense. For high-frequency AI workloads, these request fees can account for up to 70% of the total storage bill.

The Shift Toward Repatriation

The rising cost of API calls and the need for predictable performance have led to a notable trend: data repatriation. Nearly half of mid-sized enterprises have begun moving specific workloads back to on-premises block storage. By using open-source, S3-compatible suites on their own hardware, companies can maintain the developer-friendly API of the cloud while avoiding the "per-request" billing model and high latency of the public internet.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1475: Why Your Cloud Folders Are a Lie: The S3 Revolution

Daniel's Prompt

Custom topic: is object storage like s3 computationally different than say how a local computer filesystem works!

You ever look at a folder on your desktop and realize you are staring at a comforting lie? I was thinking about this while organizing some project files yesterday. We see these little yellow icons, we nest them inside each other, and we imagine this tidy physical hierarchy like a filing cabinet in a dream. But as soon as you step into the world of large-scale cloud infrastructure, that dream evaporates. Today's prompt from Daniel is about how object storage, specifically something like Amazon S3, is fundamentally different from the local filesystems we have been using since the eighties.

It is a perfect time to talk about this because Amazon S3 actually just celebrated its twentieth anniversary on March fourteenth, twenty twenty-six. I am Herman Poppleberry, and I have been digging into the latest state of the union reports from AWS. When S3 launched back in two thousand six, it was basically fifteen racks of hardware. Now, as of this month, they are reporting over five hundred trillion objects across their global infrastructure, handling more than two hundred million requests every single second.

Five hundred trillion. That is a number so large it stops being a statistic and starts being a geological feature of the internet. But to Daniel's point, when a developer interacts with those five hundred trillion objects, they are not using the same logic they use to save a Word document to their hard drive. You mentioned the anniversary, but there was a massive technical shift this month too, right? Something about the naming war finally ending?

You are thinking of the Account Regional Namespaces update that dropped on March twelfth. For twenty years, if you wanted to name a bucket "my-data," and someone in Dublin or Tokyo had already taken that name, you were out of luck. Bucket names had to be globally unique across every single AWS customer. It was a massive pain point for automation and security. But as of two weeks ago, AWS finally implemented a system where bucket names only need to be unique within your specific account and region. It sounds like a small quality-of-life fix, but architecturally, it represents a huge shift in how they handle the global index.

It only took them two decades to let me name my own buckets whatever I want. Progress moves at a sloth's pace sometimes. But let's get into the "why" here. If I am on my laptop, I am using NTFS or APFS or maybe ext4 if I am feeling spicy. Those are all POSIX-compliant, which we should probably define because it is the root of the friction people feel when they move to the cloud.

POSIX, spelled P-O-S-I-X and usually pronounced PAH-zicks, is essentially a set of standards that define how an operating system should behave. For a filesystem, being PAH-zicks compliant means you can do very specific things. You can open a file, you can seek to the middle of it, you can change a single byte, and you can save it without touching the rest of the file. It is designed for low-latency, random-access operations. Your local NVMe drive might have a latency of maybe ten to one hundred microseconds. It is incredibly fast, but it is also very fragile and limited to that one physical machine.

Right, so if I have a giant database file on my hard drive and I want to update one user's phone number, the filesystem just goes to that specific spot on the disk, flips the bits, and calls it a day. But if I try to do that with an object in S3, the whole thing falls apart.

It does because object storage is fundamentally immutable. When you put an object into S3, you are not writing to a sector on a disk in the traditional sense. You are interacting with a RESTful API over HTTP. If you have a five-gigabyte video file stored as an object and you want to change one second of footage in the middle, you cannot just "edit" it. You have to re-upload the entire five-gigabyte object. The system treats every file as a single, discrete unit of data. This is why we call it object storage rather than a file system. You are storing "blobs" of data, and once they are there, they are set in stone until you replace them entirely.

That sounds incredibly inefficient if you are thinking like a desktop user, but I assume the trade-off is what allows for that massive scale you mentioned. If I am a developer and I am used to the hierarchical tree of folders, how does S3 actually organize those five hundred trillion things?

That is the core of it. Local filesystems rely on something called an EYE-node table, or Inode table. Think of an EYE-node as a record that tells the computer where the actual data blocks for a file are located on the physical platter or flash chips. It is a map. As you add more files, that table grows, and searching it becomes a bottleneck. You cannot have a single EYE-node table that spans a hundred thousand servers. Object storage replaces that hierarchy with a flat namespace. There are no folders. There are only Keys.

I love this part because it blows people's minds. When I see a path like "bucket-name slash images slash twenty-twenty-six slash photo dot jpg," my brain sees three layers of folders. But you are saying that is just a string?

It is just a name. The "slash" character is not a directory separator in S3; it is just another character in the Key, which is the unique identifier for that object. When you use a tool that shows you a folder view of S3, it is just doing a string search for all Keys that start with that specific prefix. It is a database query, not a directory traversal. In a local filesystem, if you delete a folder, the OS just updates the EYE-node table for that directory. In S3, if you want to "delete a folder," you actually have to send a delete request for every single individual object that has that prefix in its name.

This is why cloud migration can be such a headache. People try to "mount" S3 like a network drive, which we talked about way back in episode seven hundred seventy when we looked at Rclone. If your application thinks it is talking to a local drive, it is going to try to do those random-access PAH-zicks operations, and S3 is going to just stare at it blankly.

Rclone tries to do the heavy lifting of pretending S3 is a local drive, but as soon as you try to run an application that expects PAH-zicks behavior—like a database that wants to do random writes—the latency kills you. You are moving from microsecond latency on a local bus to one hundred or two hundred milliseconds of latency over the internet. Plus, the overhead of the HTTP handshake for every single operation. If you try to save a tiny text file, you are doing a full TCP connection, a TLS handshake, and an HTTP PUT request. On a local disk, that is a few CPU cycles. In the cloud, that is a whole journey across the country.

And that brings us to the durability side of things. Local filesystems are prone to bit rot and physical failure. If your drive head crashes, that data is gone. S3 is designed for what they call "eleven nines" of durability. That is ninety-nine point nine followed by eight more nines. To achieve that, when you upload an object, AWS typically replicates it across at least three different physical availability zones, which are essentially separate data center clusters miles apart. If a literal meteor hits one data center, your cat photo is still safe in the other two.

And that replication is the reason for the latency. You are waiting for that data to be mirrored across a geographic region before the API returns a "success" message. But I saw that AWS is trying to fix this with the S3 Express One Zone class. That feels like they are admitting that sometimes, people just want the speed of a local drive even in the cloud.

S3 Express One Zone is a fascinating pivot. They launched it to handle the massive IO demands of AI training. In that model, they sacrifice the geographic redundancy—it stays in one availability zone—to get single-digit millisecond latency. It is basically the closest they have come to making object storage behave like a high-performance local block device, but it is still accessed via the same API logic. It is for those "chatty" applications that need to read and write millions of times but do not necessarily need to survive a regional disaster.

Let's talk about the metadata, because that is the other big differentiator Daniel mentioned. On my Mac, if I want to tag a file, I am limited to what the OS allows. Maybe a color label or a few keywords. But with S3, you can attach "Rich Metadata."

This is where object storage starts to look more like a database than a filesystem. In a local system, metadata is usually just the file size, the creation date, and the permissions. In S3, you can attach custom key-value pairs to the object itself. You can tag an object with "Project: Alpha," "Department: Finance," or "Retention-Policy: Seven-Years." Because these tags are part of the object, you can build entire automation pipelines around them.

You can have a policy that says "any object tagged with 'Finance' gets moved to cheaper archive storage after thirty days." You cannot do that easily with a traditional folder structure without writing complex scripts that crawl the EYE-node table. In S3, the storage itself is "intelligent" in a way. It knows what it is holding. This decoupling of the data from its location is what makes the cloud so powerful. You do not care which server the file is on; you just care about its Key and its tags.

It makes the storage "searchable" in a way that is decoupled from where the file is located. But there is a catch here, and it is something that has become a major controversy this year. We are seeing this "API Tax" debate heating up. Herman, you have been tracking the costs on this, right?

The API Tax is the big story of twenty twenty-six. For years, the complaint about cloud storage was egress fees—the cost of getting your data out. Cloudflare really pushed the market there with their R2 service and the "Zero Egress" model, which eventually forced Amazon and Google to drop most egress fees for customers leaving their platforms back in twenty twenty-four. But now that moving the data is cheaper, the providers are leaning harder into request fees.

Right, because if you are running an AI workload that is doing millions of "LIST" or "PUT" operations a second to fetch vector data, those fractions of a cent per request add up faster than the actual cost of storing the gigabytes. I was reading a report that for some high-frequency workloads, the API request fees are now sixty or seventy percent of the total bill. If you have a million tiny files, you might pay ten dollars to store them but a thousand dollars just to check if they are still there.

It has completely changed how we have to architect systems. In a local filesystem, doing a "list files" operation is practically free. In S3, every time you ask the system "tell me what is in this bucket," you are paying for that API call. This is why we are seeing that repatriation trend Daniel referenced. The latest industry reports from early March twenty twenty-six show that forty-five percent of mid-sized enterprises have moved at least one major workload back to on-prem block storage.

Forty-five percent? That is a massive reversal of the "cloud-first" mantra we have heard for the last decade. Is that purely a cost play, or is it about performance predictability?

It is both. When you are on-prem, using a system like Min-EYE-oh—which is an open-source, S3-compatible suite—you get the benefit of the S3 API for your developers, but you are running it on your own NVMe hardware. You get the predictability of knowing exactly what your latency will be, and you are not getting hit with a bill for every "GET" request. Mid-sized companies are realizing that for certain steady-state workloads, the "infinite scale" of the cloud is an expensive insurance policy they do not actually need. If your data grows at a predictable rate, why pay the "API Tax" to a giant provider?

It is funny because it comes back to that "plumbing" we talked about in episode seven hundred twenty-eight. If you understand the underlying file system, whether it is ZFS or a flat object store, you can make better decisions about where that data should actually live. But I want to go deeper on the AI side of this. Daniel mentioned "S3 Vectors" going into General Availability this month. That feels like a direct response to this "API Tax" and performance issue.

S3 Vectors is a huge deal. Traditionally, if you wanted to do a vector similarity search for an AI model, you had to pull your data out of S3 and put it into a specialized vector database like Pinecone or Milvus. That meant paying for egress or at least paying for the compute to move it. With S3 Vectors, you can now perform those similarity searches directly on the objects where they sit. AWS is essentially moving the compute closer to the storage to minimize the data movement.

It is turning S3 from a "cloud hard drive" into a "compute-adjacent platform." It is like the storage itself is becoming a database engine. But does it break the simplicity that made S3 popular in the first place? I mean, the "S" in S3 stands for "Simple." Is it still simple when it is doing vector math and regional naming and single-zone low-latency tiers?

It is definitely getting more complex, but the core "Simple" part is still the API. If you know how to write a "PUT" or a "GET" request, you can use S3. That is the genius of it. They have kept the interface the same while completely swapping out the engine and adding a turbocharger. The real challenge for us as users is unlearning the habits of the local filesystem. We have to stop thinking about "folders" and start thinking about "data management."

I think that is the biggest takeaway for the developers listening. If you treat S3 like a nested folder on your laptop, you are going to have a bad time. You are going to have high latency, high costs, and a lot of frustration. But if you embrace the "Key-Value" nature of it—if you use the metadata to its full potential and architect for immutability—you can build things that are literally impossible on a local machine.

One thing I find wild is the durability math. We say "eleven nines," but if you translate that into real-world terms, it means if you store ten million objects in S3, you can expect to lose exactly one of them every ten thousand years. Compare that to a local hard drive where the annualized failure rate is often between one and four percent. It is a completely different category of reliability. Your local drive is a temporary scratchpad; S3 is a digital monument.

Unless you accidentally delete the bucket yourself. No amount of "nines" can protect you from a tired developer with admin privileges and a bad script. I have seen more data lost to a "delete-recursive" command than to hardware failure.

Which is why Versioning and Object Lock are so important, but that is a whole other rabbit hole. The point is, the "cloud hard drive" is a metaphor that has outlived its usefulness. We are moving into an era where storage is a distributed database that happens to hold files. The March twenty twenty-six updates prove that AWS is leaning into this. They are making it easier to name things, easier to search things with vectors, and faster to access things with Express One Zone.

So, for the developers listening, what is the move? If you are starting a new project in twenty twenty-six, do you go all-in on the new S3 Vector features, or do you look at that forty-five percent repatriation stat and think about staying closer to the metal?

I think the move is to audit your request patterns. This is the most important takeaway. If your application is "chatty"—meaning it makes thousands of small requests—you either need to move to S3 Express One Zone or look at an on-prem solution like Min-EYE-oh. If you are building for "write-once, read-many" scale, then the traditional S3 model is still the gold standard. And for heaven's sake, start using the custom metadata. Stop trying to bake all your information into the filename string. Use the tags. That is what they are there for.

It is about using the right tool for the job. A sloth does not try to run a marathon, and you should not try to run a high-frequency trading database on standard S3. I think we have covered the architectural shift Daniel was looking for. The "folder" is a lie, the "slash" is just a character, and the "API Tax" is the new reality. It is a world of flat namespaces and RESTful calls now.

It really is. And I am curious to see where we are in another twenty years. Will the concept of a local filesystem even exist, or will our operating systems just be thin clients for a global object store? We are already seeing the lines blur with things like S3 Vectors.

If it means I never have to see a "Disk Full" error message again, I am all for it. But I suspect we will always want that microsecond latency for our local tasks. There is something satisfying about a file that actually lives right where you put it, even if it is just a comforting lie.

That is the donkey in you, Corn. You like the sturdy, physical reality of the local disk. You want to feel the platters spinning.

Guilty as charged. Well, I think that is a wrap on the architecture of the cloud. Thanks as always to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes.

And a big thanks to Modal for providing the GPU credits that power the research and generation pipelines for this show. They make it possible for us to dive this deep every week into the technical weeds.

This has been My Weird Prompts. If you are enjoying these deep dives into the plumbing of the internet, do us a favor and leave a review on your favorite podcast app. It really does help other people find the show and helps us keep the lights on.

You can also find our full archive and all the ways to subscribe at myweirdprompts dot com. We have over fourteen hundred episodes in there now, covering everything from the history of ZFS to the future of quantum networking.

Search for us on Telegram too if you want to get notified the second a new episode drops. We will be back next time with whatever weirdness Daniel or the rest of you send our way.

See you then.

Later.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.