#2478: MCP File Handling: Why Your Base64 Upload Breaks at 4MB

MCP has no standard file input. Base64 breaks at 4MB, presigned URLs need whitelisting, and MinIO workarounds aren't standardized.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2636
Published: Apr 27
Duration: 24:21
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: model-context-protocol data-integrity mcp-file-handling

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The File Problem Nobody in MCP Wants to Talk About**

The Model Context Protocol (MCP) has a gaping hole in its design: there is no standard way for a server to ask a client for a file. As of today, the protocol's official answer is essentially a text prompt that says "paste your base64 string here" — what the MCP File Uploads Working Group charter itself calls "prose instructions" that produce "inconsistent UX pushing encoding details onto end users."

Why Base64 Is a Trap

Base64 encoding expands data by roughly 33%, which is bad enough. But the real killer is a hard-coded 4MB maximum message size in the streamable HTTP transport of the Python SDK. Send a base64-encoded image larger than that threshold, and you get a 413 Request Entity Too Large error. A GitHub issue (#1012) specifically asks to make that limit configurable; it remains hard-coded.

Even when files fit under the size limit, base64 burns through context windows. One developer reported that after processing just two chunks of a base64-encoded file, the representation alone reached 1.12 million tokens — consuming the entire context budget before doing anything useful with the file.

The Three Workarounds (None Are Complete)

Developers have converged on three approaches, each with sharp edges:

Base64 encoding — works for tiny files, breaks in production.
Presigned URL pattern — the server generates an HMAC-signed upload URL with a short TTL; the LLM uses curl to upload directly. Elegant, but requires users to manually whitelist the server's domain in their Claude settings. Non-technical users can't manage this, and the pattern only works in clients with code execution sandboxes (no mobile support).
MinIO/S3 bucket on the MCP server — files upload to an object store, and the server accesses them as local objects. Works well, but means every MCP server ships its own storage infrastructure. No standardization.

The Draft Solution: SEP 2356

A formal proposal called SEP 2356, authored by Olivier Chafik and sponsored by Den Delimarsky (both from Anthropic), introduces a new JSON Schema extension keyword called mcpFile. Servers would mark which arguments expect file input, and files would be transmitted as RFC 2397 data URIs. For larger files, the spec permits file:// and https:// URIs as alternatives — though the large-file mechanism is explicitly deferred to a future spec.

The Deeper Architectural Question

The fundamental tension is whether MCP is a local protocol that reaches out to remote services, or a remote protocol that reaches into local resources. The aggregation use case — one gateway accessible from desktop, workstation, and Android — is inherently remote-first. You can't integrate with local Photoshop from your phone. For anyone building a centralized MCP gateway, the correct architecture is centralized storage. Daniel's MinIO approach isn't a workaround; it's the right design for a problem the protocol hasn't caught up with yet.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2478: MCP File Handling: Why Your Base64 Upload Breaks at 4MB

Daniel sent us this one, and it's a genuinely messy technical problem that I don't think enough people are talking about. He's been building MCP toolkits and keeps hitting the same wall — how do you handle file paths when you're aggregating MCP servers across local and remote contexts? You've got local MCPs that need direct filesystem access, like something interfacing with Photoshop on your machine. Then you've got remote MCPs where the server lives on someone else's infrastructure and you need to upload files to it. When you centralize everything behind a gateway so your toolkit works whether you're on desktop, workstation, or Android, the assumption that a file is local to the server completely breaks down. He's tried base64 encoding, found it unreliable, and his current workaround is running a MinIO S3 bucket on the MCP server and uploading through that. He wants to know what other workarounds exist and why this hasn't been discussed more.

Oh, this is the exact problem that's been quietly driving MCP builders up the wall for months. And by the way, quick note — today's episode is being written by DeepSeek V four Pro. Hello to our silicon friend.

Now back to files being a nightmare.

Here's what makes this so much worse than people realize. The Model Context Protocol has no standard file input mechanism. As of today, if an MCP server needs a file from the user, it has to resort to what the working group charter literally calls prose instructions — basically begging the user in a text prompt to paste a base64 string or provide a local path. And that produces what the charter describes as inconsistent UX that pushes encoding details onto end users. That's a direct quote from the MCP File Uploads Working Group charter, published April twenty-third of this year.

So the official protocol's answer right now is essentially a sticky note saying please attach file here.

That's uncomfortably close to accurate. And Daniel's base64 experience is exactly what the GitHub issues are filled with. Base64 encoding expands data by roughly thirty-three percent, which is bad enough on its own. But the streamable HTTP transport in the Python SDK enforces a hard-coded four megabyte maximum message size. So you try to send a base64-encoded image, you hit that four meg cap, and you get a four-thirteen Request Entity Too Large error. There's a GitHub issue — number ten twelve on the Python SDK — specifically asking to make that maximum message size configurable, and as of now it's still hard-coded.

Base64 is basically a trap. It works for tiny files and then blows up in production exactly when you need it not to.

It gets worse. There's a GitHub discussion, number eleven ninety-seven, where a developer reported that after processing just one or two chunks of a base64-encoded file, they were already hitting token limits — the base64 representation alone was reaching something like one point one two million tokens. That's your entire context window gone before you've even done anything useful with the file.

Which means the protocol's implicit answer to file handling is actively hostile to the protocol's own token economics.

And this is why the working group got chartered in the first place. It's led by Den Delimarsky from Anthropic and Nick Cooper from OpenAI, with Olivier Chafik also from Anthropic as a member. Their mission is to define how MCP tools declare file inputs so that hosts can present native file pickers and pass user-selected file content to servers. That's the dream — you click a file picker, the protocol handles the rest.

There is a formal effort. But what's the actual proposal on the table right now?

It's called SEP twenty-three fifty-six, authored by Olivier Chafik and sponsored by Den Delimarsky. It proposes a new JSON Schema extension keyword called mcpFile that servers use to mark which arguments expect file input. Files would be transmitted as RFC twenty-three ninety-seven data URIs — so data colon mediatype semicolon name equals filename semicolon base64 comma data. And here's where it gets interesting for Daniel's problem specifically. The SEP also permits file colon slash slash and https colon slash slash URIs as alternatives for larger files.

They're acknowledging that data URIs won't work for big files even within the proposal itself.

They explicitly say that URL-mode elicitation covers files too large to embed. But they're punting the large-file mechanism to some future spec. So even when SEP twenty-three fifty-six lands, which is still in draft with TypeScript and Python SDK reference implementations in progress, it doesn't fully solve the problem Daniel is describing. It gives you a clean declarative way to say this tool needs a file, but for large files, you're still going to need something like what he built.

Let's talk about what he built then, because I think his solution is more interesting than he's giving himself credit for. He's running MinIO, which is an S3-compatible object store, on the MCP server itself. Files get uploaded there, and then the MCP server can access them as local objects. It's a side channel.

It's not just his idea in isolation. There are existing MCP servers that do exactly this pattern. There's one called txn2 slash mcp-s3 that provides tools like s3 underscore put underscore object specifically for AI assistants to upload to S3-compatible storage. And there's essov3 dash minio which offers a self-hosted MinIO MCP server with fifteen different S3 operation tools. So the pattern exists, it's just not standardized.

Here's what I think Daniel is actually getting at, and it's the part nobody's talking about. The MinIO approach works, but it means every MCP server that needs file access has to ship with its own object storage infrastructure. That's not a protocol solution. That's every builder independently reinventing the same wheel with different bolt patterns.

It creates a fragmentation problem. If I'm building a client that aggregates MCP servers, and five different servers each expect files to arrive through five different side channels, how do I present a unified interface to the user? The user doesn't care that server A uses MinIO and server B uses presigned URLs and server C still wants base64. They just want to upload an image and have it work.

This is the aggregation paradox you mentioned. The whole point of centralizing MCPs behind a gateway is to decouple tools from any specific client machine. But local MCPs inherently require local access, and remote MCPs need files uploaded to them. The gateway sits in the middle with no standard bridge.

There was a really good ecosystem survey published earlier this month — April fifth, on a site called Hey It Works — that looked at seventeen different MCP aggregation and gateway tools. MetaMCP, IBM ContextForge, all the major ones. And their conclusion was blunt. No tool fully satisfies all target requirements for centralized MCP access. And here's the part that directly validates Daniel's experience. The survey author found that in practice, deployments split into two MetaMCP instances. One on localhost for servers needing local file access, and one on a VM for everything else. And they described this as not by design choice but by necessity — some MCP servers don't have a clean way to pass in binary files or local resources over the network.

The current state of the art is you run two gateways and pretend it's one. That's not a solution, that's a workaround with a load balancer.

It completely breaks the Android use case Daniel mentioned. If you need a localhost instance for local file access, what does localhost even mean on Android? Mobile clients don't have code execution sandboxes, they don't have direct filesystem access in the same way desktop does, and they certainly can't run arbitrary curl commands to hit presigned URLs.

Let's dig into the presigned URL pattern, because that's the other major workaround that's emerged. You mentioned it's the leading production approach.

Yeah, FutureSearch documented this back in February. The pattern works like this. The MCP server generates an HMAC-signed upload URL with a five-minute time to live. The LLM, running inside a code execution sandbox, uses curl to upload the file directly to that URL. The server then returns a lightweight artifact ID — something like thirty-six characters — that fits easily in the context window. So you never actually put the file contents into the MCP protocol at all. The protocol just shuttles URLs and IDs around.

That's elegant. The file travels through a completely separate channel, and MCP only ever handles references.

It is elegant, but it has a sharp edge that makes it impractical for a lot of users. The code execution sandbox — the thing that runs the curl command — blocks outbound network requests by default. So for this to work, the user has to manually whitelist the MCP server's domain in their Claude settings under Additional Allowed Domains.

Which is fine for developers. It is absolutely not fine for anyone else.

You're asking a non-technical user to open a settings panel, find a network configuration option, and type in a domain name correctly. And if they get it wrong, the file upload silently fails in a way that's really hard to debug. This is a significant UX barrier, and it also means the pattern only works in clients that have a code execution sandbox — Claude dot ai, Claude Desktop. It won't work on mobile, it won't work on thin clients, and it won't work in any MCP client that doesn't implement a sandbox.

We've got three approaches on the table, none of which are complete. Base64 encoding, which breaks at four megabytes and burns tokens. The presigned URL pattern, which requires sandbox whitelisting and doesn't work on mobile. And the MinIO slash S3 bucket approach, which works but requires every server to carry its own object storage.

None of them address the cross-platform path problem, which is a related but separate headache. MCP filesystem servers currently use platform detection — Node dot js os dot platform and path dot resolve — with secure path restrictions that are different on every operating system. On Linux it's restricted to slash home slash, on Windows it's C colon backslash Users backslash. Android support isn't addressed anywhere in the available sources.

Daniel's concern about accessing the same toolkit from Android isn't just unsolved — it's not even on the roadmap.

It's a greenfield problem. And I think this gets to the deeper architectural question that the protocol hasn't answered yet. Is MCP fundamentally a local protocol that can reach out to remote services, or is it a remote protocol that can reach into local resources? Because the answer determines whether file handling should be client-to-server or server-to-client.

That's the right framing. If MCP is local-first, then the client owns the filesystem and servers request access. If it's remote-first, then servers own storage and clients upload. Right now it's trying to be both and the file handling reflects that schizophrenia.

The aggregation use case that Daniel is describing — one gateway, accessible from any device — that use case is inherently remote-first. You can't have local Photoshop integration from your Android phone. The gateway has to live somewhere, and that somewhere has to have its own storage that all the remote MCP servers can access.

Which means Daniel's MinIO approach isn't a workaround at all. It's actually the correct architecture for the problem he's trying to solve. He's just ahead of where the protocol is.

I think that's right. If you're building a centralized MCP gateway, you need centralized storage. The gateway becomes the place where files live, and every MCP server that needs file access talks to the gateway's storage, not to the client's local filesystem. The client's only job is to get files into that storage, which is a much simpler problem.

What does the actual upload mechanism look like in that world? Because you still have to get the file from the user's device to the gateway's MinIO bucket.

That's where I think the presigned URL pattern actually shines, but flipped around. Instead of the MCP server generating the URL, the gateway generates it. The user's client — desktop app, web interface, whatever — gets a presigned upload URL, puts the file directly into the gateway's S3 bucket, and then the gateway passes an internal reference to whichever MCP server needs it. The client never talks to the MCP server directly about files at all.

On Android, the client is just an app that can do HTTP uploads, which every app can do. No sandbox, no curl, no whitelisting.

The complexity moves to the gateway, where it belongs. The gateway handles authentication, storage, and routing. MCP servers just declare what files they need, and the gateway provides them from its own storage.

This sounds suspiciously like a product pitch.

It kind of is, but it's also where the ecosystem seems to be heading whether anyone plans it or not. That survey of seventeen aggregation tools found that the ones that worked best were the ones that didn't try to pretend local and remote were the same thing. They accepted the split and built infrastructure to bridge it.

Let's talk about what builders should actually do today, because SEP twenty-three fifty-six is still in draft and the working group just got chartered. If Daniel or someone like him is building an MCP toolkit right now, what's the least-bad option?

For a centralized gateway architecture, I'd say the MinIO approach with a thin upload layer is the way to go, and I'd add a couple of things Daniel might not have tried. Use presigned upload URLs from the gateway to the client so the client uploads directly to MinIO without the gateway proxying the bytes. That keeps latency down and doesn't saturate the gateway's bandwidth. Then use S3 event notifications to trigger processing when a file lands.

The flow is client requests upload URL, gateway generates a presigned PUT URL pointing at MinIO, client uploads directly, MinIO fires an event, MCP server picks up the file.

And the agent skill Daniel mentioned — writing the procedure so the LLM knows how to do this — that's actually a critical piece that a lot of implementations miss. You need to document the upload flow in a way the model can follow reliably. That means explicit tool descriptions, clear error messages, and probably a dedicated upload endpoint rather than expecting the model to construct S3 requests from scratch.

What about the base64 problem specifically? If someone has to use base64 for small files, are there any mitigations?

If you absolutely have to use base64, chunk it. Break the file into pieces under that four megabyte limit, send each chunk as a separate message, and reassemble on the server side. It's ugly and it adds latency, but it avoids the four-thirteen error. The bigger problem is the token consumption, and there's no real fix for that except keeping files small. Anything over a few hundred kilobytes in base64 is going to eat your context window.

Chunking solves the transport problem but not the cost problem.

Which is why I think the long-term answer is what SEP twenty-three fifty-six is gesturing toward with those URL-mode alternatives. The protocol needs a first-class concept of file references that aren't file contents. A pointer, not the data itself.

That's fundamentally what Daniel's MinIO approach gives him. The file lives at a known location, and everything else just passes around the location.

Which is how the web solved this problem thirty years ago. You don't embed images in your HTML, you put in an img tag with a src attribute pointing to a URL. MCP is rediscovering the same lesson.

There's something almost comforting about watching a cutting-edge protocol stub its toe on the same rock that HTTP stubbed its toe on in nineteen ninety-three.

The difference is HTTP had the luxury of evolving slowly. MCP is being asked to solve this while the ecosystem is exploding. The file uploads working group was chartered less than a week ago, and there are already production systems that need this working today.

What's your prediction? Does SEP twenty-three fifty-six actually land and solve this, or are we going to be living with side-channel workarounds for the foreseeable future?

I think SEP twenty-three fifty-six lands and solves the declarative part — the part where a server can say I need a file and the host can present a file picker. That's achievable and the reference implementations are already in progress. But the large-file transport mechanism, the part that actually moves bytes around efficiently across network boundaries, I think that's going to remain a side-channel problem for at least the next year. The working group charter explicitly scopes that out for now.

Which means builders should probably invest in their upload infrastructure and not wait for the protocol to save them.

Build the MinIO bucket. Or better yet, build an abstraction over whatever object storage makes sense for your deployment — S3, MinIO, Cloudflare R2, whatever — and design your MCP tools to accept storage references instead of file contents. When the protocol eventually standardizes file handling, you swap out the upload mechanism and keep everything else.

That's good advice. One thing I want to pull on though — you mentioned the sandbox whitelisting problem with the presigned URL pattern. If someone is building for Claude specifically, is that just a permanent friction point?

For now, yes, and I don't see it changing quickly. The sandbox exists for good security reasons — you don't want an LLM making arbitrary outbound network requests without user consent. But the current implementation puts the burden on the user to configure domains, which doesn't scale. What I'd like to see is MCP servers being able to declare required outbound domains in their server manifest, and the host prompting the user for one-time approval during server installation rather than requiring manual settings panel configuration.

That would move it from a configuration problem to a consent problem, which is much more manageable.

Install the server, get a prompt saying this server needs to upload files to storage dot example dot com, allow or deny, done. That's a much better UX than digging through settings panels.

Alright, we've covered the current mess, the proposed fixes, and the practical workarounds. Let's give people something actionable.

And now: Hilbert's daily fun fact.

Sloths can hold their breath underwater for up to forty minutes, which is longer than dolphins can. Dolphins max out at about ten to fifteen minutes.

If you're building MCP toolkits today, here's what we'd actually recommend. First, don't wait for SEP twenty-three fifty-six. It'll help when it lands, but it won't solve the large-file problem. Second, invest in a storage layer — MinIO is a solid choice, and Daniel's approach of running it on the MCP server is perfectly reasonable for small to medium deployments. Third, separate your file transport from your MCP communication. Use presigned URLs or direct S3 uploads to get files into storage, then pass lightweight references through MCP.

If you're building for multiple client types — desktop, web, mobile — design your upload flow to work over plain HTTPS. HTTP PUT with a presigned URL works everywhere. Curl in a sandbox doesn't. The simpler your upload mechanism, the more clients you'll support.

Also, if you're stuck with base64 for legacy reasons, implement chunking and set hard file size limits. Don't let users try to shove a fifty-megabyte PSD through a base64 pipe. It will fail, and it will fail in a way that's confusing to debug. Catch it early with a clear error message.

For the aggregation use case specifically, accept that you're going to need two instances if you're mixing local and remote MCP servers. It's not elegant, but it works. Run local servers on localhost with filesystem access, remote servers on a VM with object storage, and design your gateway to route between them transparently. The user shouldn't have to know which server lives where.

The long-term bet is that MCP evolves toward a proper file reference model — URLs, not blobs. Build your systems with that assumption and you'll be well positioned when the protocol catches up.

Keep an eye on the File Uploads Working Group. Den Delimarsky and Nick Cooper are both deeply experienced, and the fact that Anthropic and OpenAI are collaborating directly on this spec is a good sign that it'll get real attention. But working groups move at working group speed, and builders need solutions now.

Which brings us back to where Daniel started. He built a working solution with MinIO and an agent skill. It's not standardized, it's not in the spec, but it works. And right now, in April twenty twenty-six, that's what good MCP engineering looks like.

One open question I'm still chewing on. When SEP twenty-three fifty-six lands with its data URI approach for inline files, does that actually help the aggregation use case at all? Or does it just give us a cleaner way to do the thing that already breaks at four megabytes?

I think it helps the declarative part enormously — servers can finally say I need a file without resorting to prose instructions. But for the transport part, I suspect the URL-mode alternative in the spec will end up being the thing everyone actually uses, and the data URI path will be for thumbnails and icons. The real file movement will keep happening through side channels.

The spec is building a beautiful front door while everyone's using the service entrance.

Which, to be fair, is how most protocols evolve. The front door gets built first, then eventually someone connects the service entrance to it.

Thanks to Hilbert Flumingtop for producing, and thanks to DeepSeek V four Pro for the script this week.

This has been My Weird Prompts. Find us at myweirdprompts dot com. We're back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2478: MCP File Handling: Why Your Base64 Upload Breaks at 4MB

Downloads

You Might Also Like

#2478: MCP File Handling: Why Your Base64 Upload Breaks at 4MB