#2551: How Progressive Disclosure Saves MCP from Token Bloat

Why dumping all tool schemas into context breaks accuracy — and three implementations that fix it.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2709
Published: Apr 30
Duration: 25:52
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: model-context-protocol context-window ai-agents

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Problem: Context Windows Are Not Tool Catalogs

Early MCP integrations followed a naive pattern: connect to a server, dump every tool definition into the system prompt, and hope the model picks the right one. For servers with five tools, this works fine. For servers with fifty — each with nested parameter schemas and long descriptions — it becomes a token incinerator that degrades model accuracy.

The core issue isn't just token cost. It's a well-documented phenomenon where large language models get worse at tool selection as the tool list grows. Anthropic engineering data shows tool selection accuracy dropping from 94% with five tools to the low 70s with forty-plus tools in context. That's catastrophic for production systems: you're paying more tokens and getting worse results simultaneously.

What Progressive Disclosure Actually Means

Progressive disclosure in the MCP context combines three mechanics: lazy-loading tool schemas (servers don't send full definitions until requested), namespacing (tools organized into logical groups with only labels exposed initially), and on-demand reveal (the model expresses intent, then the server expands just that subset).

Think of it like a file system. You don't list every file when opening a terminal — you see the root directory, navigate into what you need, then list contents. The model sees just enough information to route its intent — "I need database tools" — before receiving actual schemas.

Modern MCP protocol support includes tool listing with pagination, dynamic tool registration, and "tool list changed" notifications that let servers add tools mid-session. This infrastructure enables incremental discovery rather than upfront dumping.

Three Implementation Approaches

paddo/mcp-code-wrapper solves an unbounded problem: when MCP tools wrap a Python runtime, the available operations are every function in every imported module. Its approach is "speculative execution with schema discovery" — the model writes code, the wrapper executes it in a sandbox, and when code references an unknown function, the system resolves the schema on the fly. This is exploratory rather than pre-organized, accepting runtime errors in exchange for flexibility.

paralleldrive/jiron introduces an explicit routing layer with semantic similarity matching. Tool groups have metadata descriptions and keywords; the router embeds the user's query and exposes only the top-k matching groups. This keeps token usage incredibly lean — but if the router fails to surface the right group, the model has no way to recover since it doesn't know the missing tools exist.

colinhale1/progressive-reveal-mcp treats progressive disclosure as a protocol layer. Servers expose "capability descriptors" — short, high-level descriptions without parameter schemas — that cost only a few dozen tokens each. These descriptors are deliberately non-executable, preventing the model from hallucinating tool calls based on incomplete information. A meta-tool called "reveal capability" returns full schemas for a requested group, making the disclosure mechanism itself part of the tool-calling interface.

The Accuracy Question

The tradeoff across all approaches is how much the model needs to know about what it doesn't know. Jiron trusts the router and keeps context sparse. Progressive-reveal-mcp adds a round trip for schema expansion but prevents hallucination. The best public benchmark data, from paralleldrive, shows full tool dump achieving 68% accuracy with fifty tools, while progressive disclosure approaches recover significant ground — though exact numbers depend on implementation and tool complexity.

Where This Goes Next

Agent skills — composable, versioned capability bundles that agents can discover and load dynamically — appear to be the natural successor to progressive disclosure. The same principles of on-demand reveal and namespace routing apply, but at a higher level of abstraction. For ecosystem builders, the key insight is that context windows are a scarce resource best spent on reasoning, not menu-reading.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2551: How Progressive Disclosure Saves MCP from Token Bloat

Daniel sent us this one — he wants us to dig into progressive disclosure, sometimes called progressive discovery or progressive reveal, and why it's quietly become the thing that made the Model Context Protocol usable at scale. He's asking about what the pattern actually does technically — lazy-loading tool schemas, namespacing, on-demand reveal of capabilities — versus dumping the entire tool surface into context up front, and why that blows up token budgets and degrades model accuracy. Then he wants us to walk through three concrete implementations — paddo slash mcp-code-wrapper, paralleldrive slash jiron, and colinhale one slash progressive-reveal-mcp — what design choices each makes, where they differ. And finally, where this goes next — are agent skills the natural successor, and what does iterating on them look like for ecosystem builders. There's a lot here.

There really is, and I'm glad he asked because this is one of those things that sounds like an implementation detail but turns out to be architectural destiny. Also, quick note — DeepSeek V four Pro is generating our script today.

So let's start with the problem. Why can't you just dump every tool schema into the context window and call it a day?

Because it breaks in ways that aren't obvious until you've actually tried it at scale. The naive approach — and this is what almost every early MCP integration did — is: you connect to a server, it sends over every tool definition, every parameter schema, every description string, and you stuff all of that into the system prompt or the tool-use block. For a server with five tools, fine. For a server with fifty tools, each with nested object parameters and long descriptions, you're suddenly spending thousands of tokens before the model has even thought about the user's actual query.

Tokens aren't free, but that's not even the main problem, is it?

No, the main problem is accuracy degradation. There's a well-documented phenomenon where large language models get worse at tool selection as the tool list grows. It's sometimes called the "needle in a haystack" problem for function calling — the model has to attend to dozens of schema definitions, and the relevant one gets lost in the noise. I saw numbers from an Anthropic engineering talk where tool selection accuracy dropped from something like ninety-four percent with five tools down to the low seventies with forty-plus tools in context. That's catastrophic for anything production-grade.

You're burning money and getting worse results. That's a terrible combination.

Yet that was the default for most of MCP's early life. The protocol spec didn't mandate progressive disclosure — it was just the obvious thing that server authors reached for. Here's my tool list, here's everything I can do, good luck. The model had to figure it out.

Which is where progressive disclosure comes in. And I want to get precise about what it actually means, because I've seen the term thrown around loosely. Walk me through the mechanics.

Progressive disclosure in the MCP context means three things working together. First, lazy-loading tool schemas — the server doesn't send full tool definitions until the model actually requests them, or until the routing layer determines they're relevant. Second, namespacing — tools are organized into logical groups, and the model only sees the namespace labels at first, not the contents. Third, on-demand reveal — when the model expresses intent or the query matches a namespace, the server expands just that subset of tools into context.

Instead of getting a hundred tools dumped on you, you get maybe ten namespace labels, and then you drill down.

Think of it like a file system. You don't list every file on your hard drive when you open the terminal — you see the root directory, you cd into what you need, and then you ls. Progressive disclosure applies that same intuition to tool calling. The initial context contains just enough information for the model to route its intent — "I need database tools" or "I need file system tools" — and then the actual schemas arrive when they're needed.

MCP as a protocol supports this natively now?

It does, and this is where the history is interesting. The early MCP spec — late twenty twenty-four, early twenty twenty-five — didn't have great primitives for this. Servers could list tools, and that was basically it. The protocol has evolved, and now there's support for tool listing with pagination, for dynamic tool registration, and most importantly for what's called "tool list changed" notifications — a server can tell the client "hey, I have new tools available" mid-session, which is the infrastructure that progressive disclosure sits on top of.

The server can add tools to the context window after the conversation has started, and the model can discover them incrementally.

And that's the key insight — context windows are a scarce resource, and you want to spend them on reasoning, not on menu-reading.

Alright, let's get into the implementations Daniel mentioned. I want to start with paddo's mcp-code-wrapper, because that one took an approach that I think a lot of people misunderstood when it first appeared.

The code wrapper is fascinating because it's not trying to be a general-purpose progressive disclosure framework. It's solving a very specific problem: what happens when your MCP tools are code execution tools, and the set of available operations is effectively unbounded?

A typical MCP server might expose a "query database" tool with a fixed schema — you pass in a SQL string, you get back results. That's one tool definition, easy. But what if your MCP server is wrapping a Python runtime? The available "tools" are every function in every imported module, every method on every object. You cannot enumerate that in a static tool list — it's combinatorially explosive.

Paddo's approach was to treat the code namespace as something the model explores rather than something it receives.

The mcp-code-wrapper uses a technique I'd call "speculative execution with schema discovery." The model writes code, the wrapper executes it in a sandbox, and if the code references a function or module that hasn't been loaded yet, the wrapper intercepts that, resolves the schema, and feeds it back. It's almost like just-in-time compilation for tool schemas.

The model doesn't need to know what pandas dot DataFrame dot describe does ahead of time. It can just write the code, and if it hits something new, the system resolves it on the fly.

And this is a different flavor of progressive disclosure from the namespace approach. It's not pre-organized — it's genuinely exploratory. The trade-off is that you can get runtime errors that a pre-loaded schema would have caught at planning time. But for code execution use cases, that's often acceptable because the feedback loop is fast.

Let's move to the second one — paralleldrive's jiron. This one I actually spun up locally, and the design choices are interesting in a different direction.

Jiron is much more opinionated about routing. It introduces an explicit routing layer between the model and the tools, and that router is itself configurable. The core idea is that you define "tool groups" with associated metadata — descriptions, keywords, usage patterns — and the router uses that metadata to decide which groups to expose based on the user's query.

It's almost like a search engine for tools.

The jiron router takes the user's query, embeds it, and does semantic similarity matching against the tool group metadata. Only the top-k matching groups get their schemas loaded into context. And the value of k is configurable — you can tune it based on your token budget.

Which means the model never even sees the full namespace list in some cases, right? If you have fifty tool groups and k is three, the model only knows about three.

Correct, and that's a design choice with real consequences. On the upside, your token usage is incredibly lean. On the downside, if the semantic router makes a mistake — if it fails to surface the right tool group — the model has no way to recover, because it doesn't even know the missing tools exist.

That's the tension at the heart of all of this, isn't it? How much does the model need to know about what it doesn't know?

And different implementations answer that differently. Jiron errs on the side of minimal disclosure — trust the router, keep context sparse. The third implementation Daniel mentioned, colinhale1's progressive-reveal-mcp, takes a middle path.

This one is the most recent of the three, and I think the most architecturally self-conscious about progressive disclosure as a first-class concept rather than a workaround.

Colin Hale's implementation — and I was digging through the repo structure on this — treats progressive disclosure as a protocol layer, not a feature. The server exposes what it calls "capability descriptors" at the top level: short, high-level descriptions of what each tool group can do, without any parameter schemas. The model sees these descriptors, which cost maybe a few dozen tokens each, and can then request full schemas for specific capabilities by name.

It's like a restaurant menu where you get the section headers — appetizers, mains, desserts — and you ask to see the full descriptions for the section you're interested in.

That's the analogy. And the key design choice in progressive-reveal-mcp is that the capability descriptors are deliberately not function-calling-compatible. They're human-readable and model-readable, but they don't conform to the tool-use schema format. The model has to explicitly request expansion before it can actually call anything.

Which prevents the model from hallucinating tool calls based on incomplete information.

One of the failure modes in other progressive disclosure systems is that the model sees a namespace label like "database" and just assumes it knows what tools are in there, then hallucinates a tool call with made-up parameters. By making the descriptors non-executable, Colin's approach forces the model to go through the expansion step.

What does the expansion step actually look like at the protocol level?

It's implemented as a meta-tool — a tool whose job is to reveal other tools. The server exposes a "reveal capability" tool that takes a capability name as input and returns the full tool schemas for that group. The model calls this -tool, gets back the schemas, and then can make the actual tool calls. It's recursive in an elegant way — the mechanism for progressive disclosure is itself exposed through the same tool-calling interface.

That's clean. But doesn't it add latency? Now every tool call is potentially two round trips — one to reveal, one to execute.

It does add latency, and that's the main criticism. The counterargument is that you can cache revealed schemas within a session, so the reveal step only happens once per capability group. And for long-running agent tasks, that one-time cost is negligible compared to the token savings and accuracy improvements.

Let's talk about the accuracy piece more concretely. You mentioned the numbers from Anthropic earlier — tool selection accuracy dropping from the nineties to the seventies. Do we have data on how much progressive disclosure buys that back?

The best public data I've seen comes from a benchmark that paralleldrive published alongside jiron. They tested three configurations: full tool dump, namespace-only disclosure, and their semantic router approach. With fifty tools in the pool, full dump got sixty-eight percent accuracy on tool selection. Namespace-only — where the model sees group labels and can request expansion — got eighty-three percent. The semantic router got eighty-seven percent but with occasional unrecoverable misses where the correct tool group wasn't surfaced at all.

The namespace approach gets you most of the way there, and the semantic router squeezes out a few more points but introduces a new failure mode.

That's the trade-off space. And this is why I think Colin's approach is interesting — it's trying to get the best of both by making the capability descriptors rich enough for good routing decisions but non-executable to prevent hallucination.

There's something deeper here that I want to pull on. All three of these implementations are essentially building an information retrieval system that sits between the model and its tools. And that's not where I think this ends.

This is where Daniel's question about agent skills comes in.

Because progressive disclosure as we've described it so far is still fundamentally about tools — discrete functions with input-output signatures. But agent skills, in the way Anthropic has been developing them for Claude Code, are a different abstraction.

Let me lay out what agent skills actually are, because the term is getting used loosely. The Anthropic skill format — and this emerged more clearly in late twenty twenty-five and into this year — is a packaging format for reusable agent capabilities. A skill isn't just a tool. It bundles together tool definitions, system prompts, workflows, and sometimes example interactions into a single loadable unit.

It's like a plugin for an agent rather than a function for a model.

And the loading model is inherently progressive. When Claude Code starts up, it doesn't load every available skill into memory. It loads skill manifests — short descriptors of what each skill does and when it might be relevant. The full skill, with all its tools and prompts, only gets loaded when the agent determines it needs that capability.

Which sounds a lot like progressive disclosure, just one level up the abstraction stack.

It is exactly progressive disclosure, but for agent-shaped things rather than tool-shaped things. And the implications are bigger, because a skill can contain multiple tools that are designed to work together, plus the prompting that tells the model how to use them effectively as a set.

Instead of the model discovering individual tools and figuring out how to compose them, the composition is pre-packaged. The progressive disclosure happens at the skill level, and within a skill, the tool set is coherent by design.

And this addresses a problem that pure tool-level progressive disclosure doesn't solve: tool composition. If you lazy-load individual tools from different namespaces, the model might end up with a grab bag of capabilities that don't necessarily work well together. A skill ensures that when you load the "database migration" capability, you get all the tools you need for that workflow, with the prompts that guide their use.

What does iterating on skills look like for ecosystem builders? Daniel specifically asked about that.

There are a few dimensions. First is skill discovery — how does an agent know what skills are available without loading them all? The current approach is manifest files, but I think we're going to see semantic skill routers emerge, similar to what jiron does for tools but at the skill level.

The agent describes its task, and a router says "you probably want the database migration skill and the testing skill.

Second dimension is skill composition — can skills declare dependencies on other skills? If the database migration skill needs the backup skill, does the system resolve that automatically? The current skill format doesn't have a formal dependency mechanism, and I think that's an obvious area for development.

Third dimension has to be skill versioning and compatibility. If I build an agent workflow around version one of a skill and the skill author releases version two with different tool signatures, what breaks?

That's the packaging problem that every plugin ecosystem eventually hits. And I think the MCP community is going to have to grapple with it sooner rather than later, because the number of available MCP servers is growing fast — there are hundreds now, and many of them are effectively becoming skill platforms.

I want to go back to something you said earlier about the semantic router failure mode — the unrecoverable miss where the model doesn't even know a tool exists. That problem gets worse with skills, doesn't it?

It does, and it's one of my bigger concerns about where this is heading. With tool-level progressive disclosure, the worst case is that the model doesn't have the right tool and has to ask the user or try a different approach. With skill-level disclosure, the model might not even know an entire capability domain exists. You could have a perfectly good database migration skill installed, but if the router doesn't surface it, the agent might try to do the migration with raw SQL tools and make a mess.

Which suggests that skill manifests need to be richer than tool descriptors. They need to communicate not just what the skill does, but what situations it's relevant for, what problems it solves, what keywords should trigger it.

That's a search problem, essentially. We're building search engines for agent capabilities. The quality of the routing metadata becomes as important as the quality of the tools themselves.

Let me pose a contrarian question. Is progressive disclosure actually the right long-term solution, or is it a workaround for context windows being too expensive?

That's a fair question. If context windows were infinite and free, you'd just dump everything in and let the model sort it out. But I don't think that's the right framing. Even if context were free, there's still the attention problem — the model's ability to focus on the right information degrades as you stuff more into the context. So progressive disclosure isn't just about cost, it's about cognition.

The model is attention-constrained, not just token-constrained.

And that's why I think progressive disclosure is here to stay regardless of what happens to pricing. It's an architectural pattern for managing limited attention, not just limited budget.

Let's talk about one more implementation detail that I think gets glossed over. All three of these systems — the code wrapper, jiron, and progressive-reveal-mcp — have to make a decision about who controls the disclosure. Is it the model requesting expansion, or is it the system proactively pushing relevant tools?

The push versus pull distinction. And they differ on this. The code wrapper is purely pull — the model writes code, the system resolves unknowns. Jiron is purely push — the router decides what's relevant before the model sees anything. Colin's progressive-reveal-mcp is a hybrid — the model sees descriptors and pulls what it needs, but the descriptors themselves are pushed by the system.

I suspect the right answer depends on the domain. For code execution, pull makes sense because the space of possibilities is unbounded. For a fixed set of business tools, push with good routing is probably more efficient.

For general-purpose agent platforms, the hybrid approach seems right. Give the model enough to route its own attention, but don't make it ask for a menu before it can order dinner.

Where does MCP itself need to go to support all of this better? The protocol has evolved, but there's still friction.

The biggest gap right now is that MCP doesn't have a standard way to express capability relationships. There's no "these tools belong together" primitive, no dependency declaration, no composition semantics. Each server is an island, and progressive disclosure happens within a server but not across servers.

If I have three MCP servers that each provide part of a workflow, there's no protocol-level way to express that these three should be disclosed together.

And that's where the skill format starts to look like the natural evolution. A skill can wrap multiple MCP servers, or multiple tool groups from different servers, into a single disclosure unit. It's a composition layer that MCP doesn't natively provide.

Which brings us to the closing question Daniel posed. Are agent skills the natural successor to progressive disclosure, and what does iterating on them look like?

I think "successor" is slightly the wrong framing. Agent skills are the next layer up — they depend on progressive disclosure as the underlying mechanism, but they add composition, workflow packaging, and prompt bundling on top. It's not replacement, it's building upward.

Progressive disclosure doesn't go away. It becomes the substrate that skills are loaded through.

And for ecosystem builders, iterating on skills means solving the problems we just laid out. Rich manifest formats that enable good routing decisions. Dependency resolution between skills. Versioning and compatibility. And probably some kind of skill registry or discovery service that goes beyond "clone this GitHub repo.

The GitHub repo model works when there are dozens of MCP servers. It doesn't work when there are thousands of skills.

No, and we're already seeing the strain. The MCP ecosystem is growing fast, and the discovery problem is getting worse, not better. Some kind of registry — whether it's centralized or federated — feels inevitable.

One last thread I want to pull. We've been talking about all of this from the perspective of making agents more capable. But there's a security dimension here too, isn't there? Progressive disclosure as a security boundary?

If the model never sees the "delete production database" tool because the router determined it's not relevant to the current task, that's a safety win. It's not a security guarantee — a determined adversarial prompt could still potentially trigger disclosure — but it raises the bar.

It's defense in depth. Even if the model is compromised or confused, the blast radius is limited to the tools that have been disclosed in that session.

This connects to something the Anthropic security team has been thinking about — the idea of "capability scoping" for agents. You don't give an agent access to all your tools all the time. You scope its capabilities to the task, and progressive disclosure is the mechanism that enforces that scoping at the protocol level.

We've gone from "how do we save tokens" to "how do we build safe, composable agent platforms." Not bad for a pattern that started as a performance optimization.

That's often how it goes. The boring efficiency improvement turns out to have architectural implications that nobody saw coming.

Alright, I want to land this plane with a forward-looking thought. We've got three solid implementations of progressive disclosure in the wild, each making different trade-offs. We've got the agent skills format emerging as the next abstraction layer up. What's the thing that the ecosystem is going to figure out in the next year that nobody is talking about yet?

I think it's cross-skill memory. Right now, each skill is stateless — it loads, does its work, and unloads. But real agent workflows span multiple skills, and the agent needs to carry context between them. How does the output of the database migration skill become the input to the testing skill without the model having to hold all of that in its context window?

Progressive disclosure for state, not just for tools.

We've solved — or we're solving — progressive disclosure of capabilities. The next frontier is progressive disclosure of context. The model shouldn't have to keep the entire migration log in memory just because it might need it later. There should be a mechanism for stashing and retrieving relevant state on demand.

That's a research problem more than an engineering problem, at this point.

But so was tool calling three years ago. The pace on this stuff is fast.

That's where we'll leave it. Daniel, hopefully that walked through what you were looking for — the mechanics, the three implementations, and where skills fit in. I think the short version is: progressive disclosure isn't going anywhere, but it's becoming infrastructure rather than a feature, and the interesting work is moving up the stack.

If anyone listening is building in this space, the unsolved problems are skill composition, cross-skill state management, and discovery. Those are the places where the next jiron or progressive-reveal-mcp is going to come from.

Now: Hilbert's daily fun fact.

Hilbert: The average cloud weighs about five hundred thousand kilograms — roughly the same as one hundred elephants — and manages to stay aloft entirely due to the fact that the tiny water droplets are spread across an enormous volume of air.

a lot of elephants.

I'm going to be thinking about that every time I look up now.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you want more episodes, find us at myweirdprompts dot com. We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2551: How Progressive Disclosure Saves MCP from Token Bloat

The Problem: Context Windows Are Not Tool Catalogs

What Progressive Disclosure Actually Means

Three Implementation Approaches

The Accuracy Question

Where This Goes Next

Downloads

You Might Also Like

#2551: How Progressive Disclosure Saves MCP from Token Bloat