#1076: The Agentic Friction: Solving the MCP Restart Tax

Why do we have to restart AI sessions just to add a tool? We dive into the "restart tax" and the future of Dynamic Tool Discovery.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-1217
Published: Mar 9
Duration: 22:18
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: ai-agents model-context-protocol architecture

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The promise of the agentic internet is a world where AI assistants seamlessly navigate our digital lives, jumping from task to task with the right tools for the job. However, as we move deeper into 2026, the reality of the Model Context Protocol (MCP) is currently defined by a phenomenon known as the "restart tax." This technical friction is the primary bottleneck preventing AI agents from feeling truly intuitive.

The Problem of the Restart Tax

Currently, using an MCP-enabled AI is often compared to the era of dial-up internet or early operating systems. When a session begins, the client must perform a "handshake" with the model, providing a static list of every tool the model might need. This list, usually formatted in JSON-RPC, acts as a pre-flight checklist.

The issue arises when a user realizes mid-conversation that the agent needs a capability that wasn't pre-loaded. Because most current architectures require these tool definitions to be "baked" into the initial system prompt or context, the user is forced to kill the session and start over to add the new tool. This "restart tax" breaks the flow of work and prevents agents from being truly dynamic.

Attention Dilution and Context Bloat

Even if a user tries to bypass the restart tax by loading every available tool at once, they run into a second, more invisible problem: attention dilution. Every tool definition consumes tokens. If an agent is given access to dozens of complex tools, the "manuals" for those tools can take up 20,000 to 30,000 tokens before the conversation even begins.

Research indicates that once tool definitions occupy more than 15% to 20% of the active context window, the model’s reasoning performance begins to degrade. The self-attention mechanism of the transformer model becomes "diluted," leading to higher error rates, hallucinations in tool arguments, and a general loss of focus on the primary task. This forces users into the role of "manual memory managers," toggling tools on and off to keep the agent sharp.

The Shift to Dynamic Tool Discovery

The solution to these bottlenecks lies in moving from a "Push" model to a "Pull" model. Instead of pushing every possible tool into the model at the start, the industry is shifting toward Dynamic Tool Discovery (DTD).

In this new framework, the model is given a single "Discovery Tool" or a "Meta-Tool." When the agent encounters a task it cannot solve with its current capabilities, it queries this discovery tool—much like asking a librarian for a specific book. The system then performs a vector search across a database of available MCP servers and injects only the relevant tool definition into the context window "Just-In-Time."

A Seamless Agentic Future

While latency was previously a concern for this multi-step process, the high inference speeds of 2026 have reduced the overhead to milliseconds. This architectural shift allows for an agent that can essentially teach itself how to use new APIs on the fly without ever needing a session reboot. By moving the administrative burden away from the user and into a dynamic middleware layer, AI agents are finally moving past their "dial-up" phase and toward a truly autonomous future.

Mentions

Anthropic AI research company behind Claude
Claude AI assistant by Anthropic
GitHub Code hosting and collaboration platform
Home Assistant Open-source home automation platform
MCP-Router Open-source MCP proxy server middleware
Microsoft Excel Spreadsheet application
Model Context Protocol Protocol for AI tool integration
Spotify Digital music streaming service

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1076: The Agentic Friction: Solving the MCP Restart Tax

Hey everyone, welcome back to My Weird Prompts. I am Corn, and I am sitting here in our living room in Jerusalem with my brother, Herman. It is a beautiful morning here, but we are about to get into some things that are decidedly less than beautiful.

Herman Poppleberry, here and ready to dive into the weeds. And I mean the deep, technical, messy weeds today, Corn. We are talking about the plumbing of the agentic age.

It is funny you say that because our housemate Daniel sent us a prompt this morning that is exactly that. It is all about the Model Context Protocol, or MCP, and some of the massive friction points that are making it feel a little bit like we are living in the dial-up era of agentic AI. He is frustrated, and honestly, after looking at the docs again this morning, I am starting to see why.

Daniel always has his finger on the pulse of what is actually annoying to use in practice. It is one thing to read a white paper about how revolutionary a protocol is, but it is another thing entirely to be sitting there trying to get work done and realizing you have to restart your entire session just to add one new capability. He called it the restart tax, and I think that is the perfect term for it.

He called it a cumbersome experience, and he is right. We have been talking about the agentic internet for a long time, but if the foundation has these weird, rigid bottlenecks, we are not going to get to that seamless future as fast as we want. We are in March of two thousand twenty-six now, and while things have moved fast, we are still dealing with some architectural baggage from the early days of two thousand twenty-four and twenty-five.

I am excited to dig into this because there is a very specific technical reason why this friction exists, and there is an even more interesting set of solutions on the horizon. We are basically moving from the toy phase of MCP into the production-grade phase, but that transition is painful.

Let us start with that restart tax. For those who might not be deep in the developer docs, explain why it is that currently, if I am in a session with a model like Claude and I realize I need a specific tool from an MCP server I have not loaded yet, I cannot just plug it in. Why do I have to kill the session and start over? It feels like rebooting your entire computer just to plug in a USB thumb drive.

That is exactly what it is like. It comes down to how the tool registry is initialized during what we call the handshake process. Right now, when you start an AI session that is MCP-enabled, the client—which is the interface you are using—has to tell the model exactly what it is capable of doing before the conversation even begins. It sends over a structured list of tool definitions, usually in a format like JSON-RPC. These definitions tell the model the name of the tool, what arguments it takes, and what it is supposed to return.

So it is like a pre-flight checklist. The pilot needs to know if the plane has landing gear before they take off.

Precisely. The model needs to build a mental map of its capabilities. The problem is that in the current implementation, that map is static. It is baked into the system prompt or the initial context. Because Large Language Models are fundamentally stateless between tokens, but stateful within a context window, the model needs to know its boundaries from the jump. If you suddenly injected a new tool mid-conversation, the current architectures do not have a clean way to update that internal map without potentially confusing the model or breaking the flow of the previous context.

But wait, we have seen models handle dynamic context before. Why is this specifically a problem for tools? If I can paste a ten-page document into a chat mid-way through, why can I not paste a tool definition?

That is the million-dollar question, Corn. You actually can paste a definition, but getting the model to reliably recognize that as a functional capability it can call via a specific protocol is different from just giving it more information to read. When a tool is registered at the start, it is often handled by a specialized layer of the software that monitors the model's output for specific trigger words or formatted blocks. If that layer is not expecting a new tool, it will not know how to intercept the model's request and route it to the correct MCP server. It is a coordination problem between the model, the client software, and the server. It feels very nineteen-nineties, like when you had to restart Windows ninety-five just to get a new printer to show up.

It really does. And that brings us to the second big issue Daniel mentioned, which is almost more important from a performance standpoint: context ingestion and attention dilution. This is where the technical debt really starts to hurt.

Right. This is the part people often overlook. They think, well, why not just load every MCP server I own at the start of every session? If I have fifty different MCP servers for everything from my calendar to my smart home to my specialized coding libraries, why not just give the model all of them?

And the answer is that you are basically flooding the model's brain with manuals before you even ask it a question.

Think about it this way. Every tool definition takes up tokens. A simple tool might be a few hundred tokens. A complex one could be a thousand. If you have fifty tools, you might be looking at twenty thousand or thirty thousand tokens just to describe what the agent can do. That is a huge chunk of the context window, even with the massive windows we have in two thousand twenty-six. But it is not just the space it takes up. It is the attention.

We talked about this a bit back in episode eight hundred nine when we were looking at context engineering. If the model has to look through a massive list of possibilities every time it thinks, its reasoning performance actually starts to degrade.

It really does. There is this phenomenon called attention dilution. When the self-attention mechanism of a transformer model has to spread its weights across a massive system prompt filled with irrelevant tool definitions, it becomes less sharp on the actual task at hand. Think of the attention mechanism like a spotlight. If you have one tool, the spotlight is focused. If you have fifty tools, you are trying to light up an entire stadium with that same spotlight. Everything gets dimmer.

Is there a specific threshold where this starts to break?

Research has shown that once tool definitions exceed about fifteen to twenty percent of the active context window, the error rate in tool calling starts to climb significantly. The model might hallucinate an argument for a tool, or it might get confused between two similar-sounding tools. It might try to use a calendar tool to solve a math problem just because the calendar tool was mentioned more recently in the prompt.

So the user is stuck in this catch-twenty-two. You want your agent to be powerful and have access to everything, but if you give it everything, it becomes slower, more expensive, and stupider.

Which leads to the manual toggle problem Daniel mentioned. Users end up being their own gatekeepers. They are sitting there thinking, okay, I am going to do some coding now, so let me turn off the Spotify MCP and the Home Assistant MCP and just turn on the GitHub and Terminal ones. It is the opposite of agentic. It is just more administrative work for the human. It is like we have gone from managing files to managing the brain of our assistant. It is not the set it and forget it future we were promised.

It feels like we are back in the era of manual memory management in programming. You have to decide exactly which bits of the agent's brain are active at any given time. If you forget to toggle the right switch, the agent fails, and you have to restart the whole session. That is an unacceptable user experience for general technology. My mom is not going to toggle JSON-RPC definitions.

She absolutely is not. And this is why we say MCP is currently in a v-one-alpha state of maturity, even if the marketing says otherwise. It is viable, but it is hampered by these quirks. However, there is a path forward. Daniel thinks we will see it solved this year, and I agree. The industry is moving toward what we call Dynamic Tool Discovery, or DTD.

How does that actually work? If the architecture is the problem, how do we fix the restart tax without breaking how LLMs function?

The shift is moving from a Push model to a Pull model. In the Push model, we push all tools into the model at the start. In the Pull model, the model—or a middle layer—pulls in the tools it needs on demand.

But how does the model know what is available to pull? If it doesn't know the tool exists, how can it ask for it?

That is where the architectural shift happens. Instead of giving the model fifty full tool definitions, you give it one Meta-Tool or a Discovery Tool. Think of it like a librarian. You tell the model, "Hey, you have access to a library of three hundred capabilities. If you encounter a task you cannot solve with your basic reasoning, call the Discovery Tool with a description of what you need."

Oh, that is clever. So the model says, "I need to check the weather in Jerusalem," and the Discovery Tool searches a database of MCP servers, finds the weather one, and then—this is the key—it injects that specific definition into the context window for just that moment.

It is Just-In-Time tool registration. It solves the context bloat because at any given time, the model only has the definitions for the tools it is actually using. And it solves the restart tax because the Discovery Tool is always there, and its backend database can be updated whenever you add a new MCP server to your system. No restart required. The model just queries the librarian again.

Why aren't we doing that yet? Is it a latency thing?

Latency is a big part of it. Every time you have to do a discovery step, you are adding a round trip. The model has to generate a request, the discovery server has to search, the new definition has to be prepended to the prompt, and then the model has to run again. In the past, that would have added seconds to the response time. But with the inference speeds we are seeing now in twenty-six, that overhead is dropping to milliseconds.

I imagine the search part of that discovery is also tricky. You can't just do a keyword search if you want it to be really agentic.

Right, it usually involves vector embeddings. You embed the descriptions of all your available tools. When the model expresses a need, you embed that need and do a similarity search. This is actually something we touched on in episode eight hundred fifty-five when we discussed how Google was looking at a web-scale MCP standard. They want agents to be able to browse the entire internet and find tools they have never seen before.

That is wild to think about. An agent that essentially teaches itself how to use a new API on the fly because it found the MCP definition on a website. But even on a local level, within our own homes or offices, this dynamic discovery is the missing link.

It really is. And there are already people building middleware to bridge the gap right now. There are custom MCP proxies that act as a single server to the model but are actually routing to dozens of background servers. These proxies can do some of that lightweight management Daniel was talking about. They can look at the user's prompt, do a quick pre-scan using a smaller, cheaper model, and only expose the most relevant tools to the main model for that specific turn.

It is almost like a pre-processor for your AI. It looks at what you said, realizes you are talking about a spreadsheet, and automatically swaps in the Excel MCP before the model even sees your message.

Precisely. It is about reducing the cognitive load on the Large Language Model. We have to remember that as powerful as these models are, they are still sensitive to noise. The more we can prune the context to only what is necessary, the better the reasoning. This is the move from user-selected tools to agent-requested capabilities.

I want to talk about the evolution of this. You mentioned browser extensions earlier. How does that comparison hold up?

It is a great analogy. Think back to early web browsers. If you wanted to add a feature, you often had to install a whole new version of the browser or a very heavy plugin that loaded every time the page loaded. Then we moved to the modern extension architecture where extensions are event-driven. They sit in the background and only wake up when a specific event happens—like you clicking a button or visiting a specific domain. MCP is going through that same evolution. We are moving from static page-load scripts to event-driven background workers.

So, what does the Operating System phase look like for AI agents? If we are currently in the manual memory management phase, what happens when the OS takes over?

I think it looks like a background service that is completely invisible to the user. You install an app on your computer, and that app comes with an MCP server. The Operating System automatically registers that server with your local agent. The next time you ask your agent to do something related to that app, it just happens. The discovery, the handshake, the context injection—it all happens under the hood.

I can see that being a huge shift for privacy too. If the tools are being managed by the Operating System, you can have much finer-grained control over what data each tool can access.

We covered some of the hardware implications of this in episode six hundred thirty-three, the Memory Wars episode. To do this well, you need dedicated hardware that can handle those vector lookups and context swaps without hitting the main CPU or GPU too hard. By late twenty-six, I expect we will see Agentic Co-processors becoming standard in high-end laptops, specifically designed to manage this dynamic tool context.

It is fascinating how a protocol as simple as MCP—which at its heart is just a way for two things to talk—ends up requiring this whole new architectural stack to be actually useful.

That is the nature of scale. When you have three tools, it is easy. When you have three million, you need a completely different philosophy. And that is where the tension is right now. We are trying to use a three tools philosophy in a three million tools world.

So, let us get practical for a second. For the developers listening who are building MCP servers right now, what should they be doing to prepare for this shift? If the world is moving toward dynamic discovery and context pruning, how does that change how you write code today?

This is the most important takeaway for developers. The biggest thing is modularity and brevity. Do not build a Swiss Army Knife MCP server that does fifty different things. If you do that, your tool definitions are going to be massive, and you are more likely to get pruned out by a discovery algorithm or cause attention dilution.

So, smaller, more specialized servers are better?

Much better. If you have a server that only does one thing, like Search PDF, and its definition is only five hundred tokens, it is very easy for a discovery agent to say, "Yep, I need that right now," and pop it in. If your server is a giant monolith that handles PDFs, emails, calendars, and weather, it is much harder to manage dynamically. It is like microservices for AI agents.

I love that. Microservices for AI. And you mentioned something about descriptions earlier?

Yes! For the love of all things holy, write good descriptions for your tools. In a world of dynamic discovery, your tool's description is its Search Engine Optimization. If the discovery agent doesn't understand what your tool does because your description is vague or uses too much jargon, the model will never call it. You are not just writing a function; you are writing a pitch to an AI agent about why it should use your function.

That is a totally different mindset. Coding for discovery rather than just coding for execution. What about for the users? People like Daniel who are frustrated right now. What can they do?

For users, the advice is to adopt a Just-In-Time mindset. Stop trying to build the perfect, all-powerful session. Instead, build small, task-specific sessions. If you are doing research, only load your research tools. If you are doing dev work, only load your dev tools. I know it is annoying to restart, but it will actually give you better results than trying to cram everything into one giant, sluggish session.

And look for those middleware solutions.

There are already open-source projects like MCP-Router and various proxy servers that can help you manage your library more effectively. They act as that librarian layer I mentioned. It is not a native solution yet, but it is a lot better than the manual toggle.

I think Daniel will be happy to hear that there is a light at the end of the tunnel. It really does feel like we are just one or two major updates away from this being a solved problem at the platform level. Anthropic has been quiet about the specifics, but the rumors in the developer community are that the next version of the MCP host protocol will include a native discovery layer. It will essentially make the registry a living object that the model can query.

I agree. The goal is a world where tools are not something you manage, but something the agent just possesses. It is the move from using a tool to having a capability.

That is a powerful distinction. I don't use my ability to speak English; I just have it. And I don't have to restart my brain to switch to speaking Hebrew—well, maybe a little bit, but you get the point. We want that same fluidity for our agents. If I am talking to an agent about a project, and I suddenly mention a budget, I want it to remember it has access to the accounting software without me having to remind it or re-initialize the session.

And looking ahead to the rest of twenty-six, I think we might see the end of the session concept entirely. If you have perfect dynamic discovery and infinite-feeling context through clever pruning, the session just becomes a continuous stream of interaction. Your agent becomes a persistent entity that lives alongside you, pulling in what it needs when it needs it. We are moving away from chatting with a bot toward working with an agent.

It is funny how these small friction points, like the restart tax, are actually the things holding back that much larger vision. It seems like a minor annoyance, but it is actually a fundamental barrier to persistence. You cannot have a persistent, life-long assistant if you have to reboot its brain every time you want it to learn a new trick. Solving this at the protocol level is the prerequisite for everything else we want to do with AI.

It really is. We are watching the scaffolding being built around us. Sometimes you trip over a board, but you can see the building taking shape. The toy phase of MCP is over. We are in the utility phase now, and that requires a level of polish that the current static architecture just cannot provide. But the road to the end of twenty-six looks bright.

I hope you are right. I am looking forward to the day I can just tell my agent to fix the guest room lights and it figures out which smart home protocol to use, finds the right MCP server, and executes the command without me even knowing there was a protocol involved.

That is the dream. A world where the technology gets out of the way of the intent. Are we building a protocol for humans to manage tools, or for agents to manage themselves? I think the answer has to be the latter if we want this to scale.

On that note, I think we have given people a lot to chew on. If you are a developer, start thinking about your Discovery SEO. If you are a user, hang in there—the Restart Tax is hopefully going to be repealed by the end of the year.

And if you are Daniel, thanks for the prompt and for keeping us honest about the user experience. It is easy to get caught up in the coolness of the tech and forget how annoying it can be to actually use.

Definitely. We really appreciate everyone who listens and sends in these ideas. If you have been following My Weird Prompts for a while, you know we love digging into these technical nuances.

And if you are enjoying the journey with us, please do us a huge favor and leave a review on your podcast app or on Spotify. It really does help the show reach more people who are interested in this kind of deep-dive exploration.

Yeah, it makes a big difference for a show like ours. You can find all our past episodes, including the ones we mentioned today like episode eight hundred fifty-five on the agentic internet and episode six hundred thirty-three on the memory wars, over at our website, myweirdprompts.com. We have a full archive there and an RSS feed if you want to subscribe directly.

This has been a great one. I feel like I understand my own frustrations with MCP a lot better now that we have talked through the architectural reasons for them.

That is the power of the Poppleberry deep dive! Alright, I think we are done for today.

Until next time.

This has been My Weird Prompts. Thanks for listening, and we will talk to you soon.

Take care, everyone.

So, Herman, before we go, do you actually have the Spotify MCP loaded right now, or do we need to restart the living room?

Oh, I have it loaded, but I think it only knows how to play donkey-related bluegrass.

Of course it does. I should have known. See ya, everyone.

Bye!

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1076: The Agentic Friction: Solving the MCP Restart Tax

The Problem of the Restart Tax

Attention Dilution and Context Bloat

The Shift to Dynamic Tool Discovery

A Seamless Agentic Future

Mentions

Downloads

You Might Also Like

#1076: The Agentic Friction: Solving the MCP Restart Tax