So Daniel sent us this one, and it's a position piece as much as a question. He's arguing that AI has been the fastest-growing technology sector in history, but the pace should actually slow down a bit. Two reasons: first, technical — context window size is still the primary bottleneck for scaling, and the hype around tooling and agentic systems has outrun the actual underlying progress. Second, and this one is interesting, a human capital argument — when the frontier resets every six weeks, nobody gets to build genuine expertise. Everyone stays a beginner permanently, and that harms practitioner quality, product quality, and the credibility of the whole industry. He finds Anthropic's release pace the most philosophically aligned with his view, and the actual question he wants us to dig into is: besides Anthropic, who are the ideological allies in this camp? Who else in the industry actually shares this worldview?
That framing lands for me. The expertise erosion point especially. I've been chewing on this for a while because the medical analogy is right there — you can't build clinical judgment if the diagnostic criteria change every six weeks. Expertise is cumulative, and cumulative learning requires a stable enough foundation to build on.
Before we get into allies, though, I want to pressure-test the premise a little. Because "the pace should slow down" is a position people say, and then the question is — slow down relative to what? Relative to human adaptation? Relative to safety work? Relative to something else? Those lead to pretty different conclusions about who your allies actually are.
That's fair, and I think Daniel is pointing at something fairly specific. It's not a safety argument in the traditional sense — it's more of a professional development and epistemic stability argument. And the context window framing is a useful anchor for the technical side. The claim is essentially that the core constraint hasn't moved as dramatically as the marketing suggests, and so a lot of the hype is paper gains on top of a static foundation.
Which is a defensible read. Context windows have grown — we went from four thousand tokens in GPT-3 to what, a million-plus in Gemini One Point Five — but the question is whether that's meaningfully changed what you can do with the technology in practice, or whether it's mostly changed what the demos look like.
And there's a real distinction there. The thing that doesn't get said often enough is that large context windows have a retrieval quality problem. You can stuff a million tokens into a prompt, but the model's ability to reliably reason over the full context degrades in the middle. There's a whole body of work on what's called the "lost in the middle" phenomenon — where models systematically underweight information that appears in the middle of a very long context. So the bottleneck hasn't just moved from "how much can I fit in" to "how well can I reason over what I've fit in." It's still a bottleneck. It's just wearing different clothes.
Right. The window got bigger, but the view through it is still blurry in patches.
Which connects back to Daniel's point about practitioner quality. If the underlying limitations are more persistent than the hype cycle acknowledges, then the churn isn't just pedagogically inconvenient — it's actually building expertise on shaky ground. Practitioners are learning frameworks and patterns that may be artifacts of current limitations, not durable principles.
There's something almost tragicomic about that. You spend six months getting really good at chunking strategies for RAG pipelines, and then a model drops with a two-million-token context window and everyone acts like chunking is archaeology.
And then the two-million-token model has the retrieval degradation problem, so you quietly go back to chunking, but now you're doing it without the conceptual scaffolding because you threw it away.
Okay. So given all of that — the technical instability, the expertise erosion, the hype-versus-reality gap — who are Daniel's ideological allies? Because I think this is interesting. Anthropic is the obvious anchor, but they're also a frontier lab. They're not actually arguing for the industry to slow down. They're arguing for their own pace to be thoughtful. Those are different things.
That's the crucial distinction. Anthropic's philosophy is not "AI should develop more slowly" — it's closer to "if you're going to push the frontier, do it with safety research running in parallel." Their responsible scaling policy, the interpretability work, the constitutional AI approach — those are about quality of development, not pace of development per se. So they're a partial ally to Daniel's view, but not a complete one.
And by the way, today's script is being generated by Claude Sonnet four point six, which makes this conversation slightly recursive. Anthropic's model is writing a discussion about Anthropic's philosophy. We should probably acknowledge that and then move on before it gets too philosophical.
Noted and filed. Okay, so who are the genuine ideological allies for the "sustainable pace, expertise accumulation, standardization" worldview? I want to think through this in a few categories. There are companies, there are researchers, there are standards bodies, and there are a handful of what I'd call infrastructure-layer players who have structural incentives aligned with this view.
Start with companies. Who's actually behaving in a way that's consistent with this philosophy?
The most interesting one to me, and probably the most underappreciated, is Mistral AI. They've been methodical in a way that doesn't get enough credit. They release open-weight models, they iterate on specific capability dimensions rather than doing kitchen-sink releases, and they've been fairly disciplined about not chasing benchmark leaderboards for their own sake. Mistral's releases have a through-line — you can see the cumulative reasoning across their model family.
Though they're also French, which means they have a slightly different relationship with the concept of urgency.
I walked into that one. But the point stands. The open-weight release philosophy is actually structurally aligned with expertise accumulation, because when you release weights, practitioners can study the model, not just consume its outputs. That's a fundamentally different epistemic relationship. You can build durable understanding of what's happening inside the system.
I hadn't thought about. Closed API access is inherently expertise-limiting, because you can only learn the interface, not the system. Open weights let you learn the system.
Which is why Meta is actually a partial ally here, and I think an underappreciated one. Llama releases have been transformative for the research community's ability to build genuine expertise. The LLaMA Two and Three releases, the Code Llama work — these gave practitioners something they could actually study. The pace of Meta's releases has been fast, but the open-weight philosophy creates a knowledge commons that accumulates over time.
Meta as an ideological ally for thoughtful AI development is a sentence I didn't expect to say today. They're also, you know, Meta.
I'm not saying they're doing it for the right reasons. Meta's motivation is competitive positioning and ecosystem lock-in through openness, which is a real strategy. But the effect on practitioner expertise is positive, regardless of intent.
Intent and effect being different things is a very grown-up observation. What about on the standards side? Because Daniel specifically called out MCP — the Model Context Protocol — as a marker of the kind of standardization he values.
MCP is a great example of what sustainable development looks like at the infrastructure layer. Anthropic introduced it, it got traction, and now you're seeing adoption from a range of players. The reason it matters epistemically is that when you have a standard protocol, practitioners can build durable expertise around the interface contract, rather than having to re-learn integration patterns every time a new model or tool ecosystem drops. It's the same reason HTTP was so important — it created a stable layer that everything else could be built on top of.
Though I'd note that HTTP took a while to become HTTP. There was a period of chaos before the standard crystallized.
And that's actually the argument for why MCP matters now rather than later. If you can get standardization earlier in the development cycle, you compress the chaotic phase. The alternative is what happened with early database connectivity — you had ODBC, JDBC, vendor-specific drivers, and it took years to rationalize. Every practitioner in that era had to carry around a mental model of like eight different connection paradigms simultaneously.
Which is basically where AI tooling is right now. Every framework has its own tool-calling convention, its own agent loop, its own memory interface.
LangChain, LlamaIndex, Semantic Kernel, Haystack — they've each built coherent internal systems, but they don't compose cleanly with each other. So you end up with practitioner expertise that's framework-specific rather than domain-general. That's the exact expertise fragmentation Daniel is pointing at.
Okay, so on the standards front, Anthropic via MCP is a genuine ally. Who else is pushing for standardization in a meaningful way?
The OpenAPI community is interesting here, and I don't mean OpenAI — I mean the OpenAPI specification, the API description format. There's been a push to apply OpenAPI conventions to AI tool definitions, which would mean that tool schemas could be described in a format that millions of developers already understand. That's a real standardization win, and it comes from the existing developer tooling ecosystem rather than from AI labs specifically.
So the old web standards world bleeding into the AI tooling world.
Which is healthy, actually. The web standards world learned a lot of hard lessons about what happens when you don't standardize early. The AI industry would benefit from importing those lessons rather than rediscovering them.
What about on the research side? Because Daniel's argument has a research dimension too — the idea that when the frontier resets every six weeks, even researchers can't build on each other's work properly.
This one I find complicated. There's a real tension in the research community between publication velocity and rigor. The preprint culture — arXiv, the lack of peer review before wide dissemination — has accelerated the pace of claimed progress significantly. You get a paper claiming a major capability advance, it gets cited and built upon before anyone has replicated it, and then six months later the replication either fails or reveals that the effect size was much smaller than claimed.
The reproducibility crisis wearing a transformer's clothing.
It's the same underlying dynamic. And the researchers who are pushing back on this are actually an interesting ally group for Daniel's worldview. There's a community around what you might call "science of AI" or "AI evaluation" — people who are less interested in pushing the frontier and more interested in understanding what current systems can and can't actually do. Researchers like those at the Center for AI Safety, or the people doing careful capability evaluations at places like Epoch AI. They're doing the epistemic hygiene work that the field needs.
Epoch AI is a good example because they're not a lab. They're not building models. They're building understanding of the scaling dynamics and the compute trends. That's exactly the kind of infrastructure that expertise accumulation depends on.
And their work on compute scaling — the estimates of training compute for frontier models, the extrapolations on hardware efficiency — that's the kind of stable empirical foundation that practitioners can actually build on. It doesn't change every six weeks. It accumulates.
I want to go back to something you said earlier about Meta and open weights, because I think there's a more nuanced version of the "who are the allies" question hiding in there. The question isn't just who is releasing models thoughtfully — it's who has a structural interest in expertise accumulation rather than expertise churn.
Say more about that.
Well, think about who benefits from expertise churn. If the frontier resets every six weeks and everyone stays a beginner, that's actually good for the labs, because it means practitioners are dependent on the labs' own educational resources, certifications, and documentation. If practitioners build genuine, durable expertise, they become more autonomous — they can evaluate lab claims critically, they can build without hand-holding, they might even push back on bad products. So there's an argument that frontier labs have a structural disincentive to slow down, even if individual people within those labs want to.
That's a cold-blooded analysis and I think it's largely correct. The certification and training ecosystems that have grown up around OpenAI and Google's models — those are monetizable precisely because the underlying technology keeps changing. If it stabilized, the training market would shrink.
Which means Daniel's actual ideological allies might be more concentrated in the enterprise software world than in the AI lab world. Because enterprise software customers are the ones who pay the price for expertise churn. Every time the tooling changes, they have to retrain their teams, rebuild their integrations, renegotiate their vendor contracts.
The enterprise software vendors — Salesforce, ServiceNow, SAP — they have a genuine interest in AI capabilities that are stable enough to build products on. They're not going to build a product on a foundation that might look completely different in six months. So they're natural allies for the "sustainable pace, stable interfaces" worldview, even if they don't frame it in those terms.
Salesforce as an ideological ally for thoughtful AI development is a sentence I also didn't expect to say today.
The world is full of surprises. But the enterprise angle is real. Salesforce's Einstein platform, ServiceNow's AI initiatives — they've been careful about how they integrate AI because their customers are large organizations that cannot absorb rapid change. The implementation cycles for enterprise software run eighteen to thirty-six months. If the AI layer underneath changes every six weeks, that's an existential problem for them.
There's also a regulatory dimension here that I think is worth naming. The EU AI Act, the NIST AI Risk Management Framework in the US — these are essentially institutional bets on the "sustainable pace" worldview. You can't build a compliance framework around a technology that resets every six weeks.
The NIST framework is a particularly interesting case because it's explicitly structured around risk management and documentation practices that assume some degree of stability. If you're documenting the AI system you're deploying for compliance purposes, you need that system to be stable enough that the documentation is still accurate when the auditor shows up.
And the labs that are engaging constructively with that regulatory process — rather than treating it as an obstacle — are implicitly signing up for a slower, more deliberate pace. Because that's what compliance requires.
Anthropic has been notably willing to engage with the regulatory conversation. They've testified before Congress, they participated in the voluntary commitments process with the White House, they've been constructive in the UK AI Safety Institute process. That's not just PR — that's a structural commitment to the kind of external accountability that forces you to be more deliberate.
Whereas some other labs have been... less enthusiastic about that process.
I'll leave that without specific names because I don't want this to become a ranking exercise. But the pattern is visible.
Okay, I want to push on the context window argument specifically, because I think it's the most technically contentious part of Daniel's position. The claim is that context window size is the primary bottleneck. But there are people who would argue that reasoning is the primary bottleneck, or that alignment is, or that multimodal grounding is. What's the actual case for context window being primary?
The case is essentially about information throughput. Most real-world professional tasks require reasoning over a large corpus of relevant context — a codebase, a document library, a conversation history, a set of constraints. The model's ability to do useful work is limited by how much of that context it can hold simultaneously and reason over coherently. And the argument is that if you can't solve the context problem, all the other capability advances are limited in their practical impact because practitioners can't get enough of the problem into the model's view.
That's a coherent argument. The counterargument I'd make is that maybe the solution isn't bigger context windows — maybe it's better memory architectures. Retrieval-augmented generation, persistent memory systems, the kind of multi-step reasoning that breaks large problems into smaller pieces. Those approaches don't require solving the context window problem; they route around it.
Which is true, but those approaches require practitioners to build expertise in the routing and retrieval layer, which is exactly the kind of expertise that keeps getting disrupted. Every time there's a major context window expansion, the RAG ecosystem gets partially deprecated. The practitioners who built expertise in chunking strategies and embedding optimization and retrieval ranking — they have to rebuild their mental models.
So the context window argument is actually downstream of the expertise argument. The technical bottleneck matters because it keeps changing the shape of the expertise that practitioners need to build.
I think that's right. And it connects to why MCP is such a good example of what Daniel values. MCP doesn't solve the context window problem directly — what it does is create a stable interface layer that abstracts over some of the underlying instability. Practitioners can build expertise in the MCP interface, and that expertise remains valid even as the models underneath change.
That's a really clean way to put it. The standard creates a stable surface for expertise to accumulate on, even when the layer below is churning.
Which is the same function that operating system APIs serve in traditional software. The fact that the Linux kernel keeps changing doesn't mean POSIX expertise becomes worthless, because POSIX creates a stable interface contract. The AI industry needs more of those stable interface contracts.
So who's building those? Besides Anthropic with MCP?
A few players. Google with their Vertex AI platform has been pushing for stable API contracts around model serving, even as the underlying models change. The idea is that you build your application against the Vertex interface, and Google manages the model upgrades beneath it. That's a different philosophy from "here's the new model, go rebuild."
Though the cynic in me notes that "build against our stable interface" is also a very effective lock-in strategy.
The cynic is not wrong. But lock-in through stability is actually preferable to lock-in through churn, from a practitioner expertise standpoint. At least with the stable interface, your knowledge accumulates. With churn-based lock-in, your expertise is perpetually held hostage.
I want to talk about the open source ecosystem specifically, because I think it's the most interesting and complicated part of this. Open source AI has been chaotic in a way that doesn't obviously serve the expertise accumulation goal. New frameworks, new model families, new fine-tuning techniques — it moves fast.
It does, but I'd distinguish between two kinds of open source activity. There's the "move fast and release things" culture, which is chaotic, and then there's the more deliberate open source philosophy that's about creating durable commons. The Hugging Face ecosystem, for example, has done a lot of work on standardization — the transformers library has a stable API that has absorbed a huge amount of model diversity. You can load a model from two years ago and a model from last month with essentially the same code.
Hugging Face as an ideological ally. I think that actually holds up.
Their contribution to the expertise accumulation goal is underrated. The model hub, the standardized model card format, the datasets library — these create the kind of institutional memory that the field needs. You can look at a model card and understand what the model was trained on, what its limitations are, how it was evaluated. That's the kind of stable epistemic infrastructure that makes expertise possible.
And they've been consistent about it. The model card format from three years ago is still legible today.
Which sounds like a low bar but absolutely is not, in this industry.
Let me ask a harder version of the question. Is there anyone in the hyperscaler world — Google, Microsoft, Amazon — who is a genuine ally here, or are they all structurally committed to pace because their competitive position depends on it?
This is where I think the answer gets complicated. Within each of those organizations, there are teams and individuals who are deeply aligned with the sustainable pace philosophy. Google's DeepMind has a research culture that values rigor over velocity in a way that's quite different from Google's product teams. The AlphaFold work is a good example — that was a decade-long investment in getting protein structure prediction right, not a six-week sprint to a demo.
Though AlphaFold is also a domain where there's a clear ground truth you can test against, which makes rigor easier to enforce.
True. The harder problem is that in the large language model space, there isn't a clear ground truth for "is this model actually better in ways that matter?" Benchmarks are gameable, human preference evaluations are noisy, and the things that matter most for professional use — reliability, consistency, calibration — are hard to measure and easy to sacrifice for benchmark performance.
Which is part of why the expertise erosion problem is so insidious. Even if you wanted to build stable expertise, the evaluation landscape is so unstable that you can't tell whether your expertise is tracking something real or tracking benchmark artifacts.
The people working on better AI evaluation are important allies here. Scale AI's work on data quality and evaluation, the HELM benchmark suite from Stanford, Eleuther AI's evaluation harness — these are attempts to create stable measurement infrastructure. If the measurement is stable, expertise about what actually matters becomes more transferable.
Eleuther is interesting because they're explicitly a nonprofit research organization. Their structural incentives are different from a lab — they're not trying to monetize the next model release.
The nonprofit and academic research institutions are probably the most philosophically aligned with Daniel's worldview, precisely because they don't have the competitive pressure to move fast. The Allen Institute for AI — AI2 — is another example. They've been doing careful, methodical work on reasoning and commonsense understanding for years, and their releases have a through-line that you can actually study.
AI2's work on the OLMo model family is a good recent example. They released training data, training code, model weights, evaluation results — the whole stack. That's the kind of transparency that makes genuine expertise possible.
And their work on Semantic Scholar, the research literature tool — that's a contribution to the knowledge infrastructure of the field. Making it easier to track what's actually been established versus what's been claimed is exactly the kind of epistemic infrastructure that expertise accumulation requires.
So if I were to draw a rough map of Daniel's ideological allies, it would look something like: Anthropic as the flagship, Mistral as the methodical European counterpart, Meta as the unexpected open-weights ally, Hugging Face as the standardization infrastructure layer, AI2 and Eleuther as the rigorous research institutions, and the enterprise software vendors as the demand-side constituency that has structural reasons to want stability.
And I'd add the regulatory bodies — NIST, the EU AI Office, the UK AI Safety Institute — as institutional allies, because they're essentially mandating the kind of deliberate pace that Daniel is advocating for.
That's a broader coalition than I expected when we started this conversation.
The interesting thing is that most of them don't describe themselves as being in coalition with each other. They're pursuing their own goals for their own reasons. But the effect of their collective activity is to create counterpressure against the six-week frontier reset culture.
Which raises an interesting question about whether that counterpressure is actually working. Because from the outside, the pace doesn't look like it's slowing down. The model releases are still coming fast, the benchmark claims are still coming fast, the framework proliferation is still happening.
I think the counterpressure is working at the infrastructure layer more than at the frontier layer. The frontier is still moving fast — that's probably not going to change while there's this much capital and competitive pressure in the space. But the infrastructure layer is getting more stable. MCP is gaining real adoption. The Hugging Face API has become a genuine standard. The model card format is widely used. The evaluation harnesses are getting better. So practitioners who are willing to build at the infrastructure layer rather than the frontier layer can actually accumulate expertise.
That's a useful reframe. Maybe the answer to Daniel's question isn't "here are the allies who will slow the frontier down" but "here are the allies who are building the stable layer beneath the frontier where expertise can actually accumulate."
And that's a more actionable answer for practitioners. You can't control the frontier reset pace. You can choose which layer you're building expertise at. If you're building expertise in transformer architecture fundamentals, in evaluation methodology, in stable interface contracts like MCP, in data quality practices — that expertise is much more durable than expertise in the latest model's specific quirks.
There's something almost Zen about that. The frontier is chaotic by nature; find the stable ground beneath it.
I'm not sure Zen is the right frame, but the point is solid. The practitioners who are going to build genuine expertise are the ones who develop taste for which problems are architecture-level and which are configuration-level. Architecture-level understanding persists. Configuration-level knowledge expires.
Okay. Practical takeaways. If you're a practitioner who shares Daniel's philosophy — you want to build genuine expertise, you want sustainable development, you want to work with the ideological allies we've been describing — what do you actually do?
A few things. First, orient toward the open-weight ecosystem. Not because closed models aren't useful, but because open weights give you a fundamentally richer epistemic relationship with the technology. You can study what you can see. Second, invest in evaluation skills. The ability to rigorously evaluate model capabilities and limitations is the most durable skill in the current landscape, because it transfers across model generations. Third, build on standard interfaces where they exist. MCP is the obvious current example. If you're building tool integrations, build to the standard, not to the model-specific API.
And fourth, I'd add, find the research institutions and follow their work rather than the lab announcements. AI2's papers, Eleuther's work, the HELM evaluation results — these are slower moving and more reliable than the announcement cycle.
The announcement cycle is designed to generate excitement, not understanding. The research cycle is designed to generate understanding. Those are different products.
The announcement cycle is also designed to generate anxiety in practitioners — "if I'm not keeping up with every release, I'm falling behind." And that anxiety is itself an obstacle to genuine expertise, because it keeps you in reactive mode rather than building mode.
That's a real dynamic. There's a difference between staying informed and being in a constant state of orientation. You need enough awareness of the frontier to know when something changes the landscape, but you don't need to rebuild your mental model every six weeks. Most six-week updates don't actually change the landscape — they change the benchmark leaderboard.
The skill of distinguishing genuine landscape changes from benchmark movements is itself a form of expertise that takes time to build.
And Anthropic's communication style is actually useful here, which is maybe why Daniel resonates with them. Their model releases tend to come with substantive technical reports, not just capability claims. The Claude three technical report, the constitutional AI paper, the interpretability work — these give practitioners something to actually learn from, not just react to.
The technical report as a form of epistemic respect for the practitioner community.
I think that's right. If you publish a technical report, you're implicitly saying "we think you can understand what we did." If you just publish a model card with some benchmark scores, you're implicitly saying "just trust us."
And trust is not expertise.
Trust is not expertise. That might be the sentence of the episode.
I'll take it. One thing I want to name before we close is the tension in Daniel's position that I think is worth sitting with. He's advocating for slower development, but he's also clearly an enthusiastic user and builder with AI tools. The tension is: you can't actually slow the frontier down by being thoughtful about which tools you use. The frontier moves because of capital allocation and competitive dynamics that are mostly outside individual practitioners' control.
That's true, but I think the response to that tension is that the goal isn't to slow the frontier down from the outside. The goal is to create the conditions under which expertise can accumulate in spite of the frontier's pace. And that's something individual practitioners, companies, and institutions can actually influence. You choose which standards to adopt. You choose which research to engage with. You choose which layers to build expertise at. Those choices, aggregated, shape the ecosystem.
The ecosystem as a collection of individual epistemic choices. That's almost a civic argument.
It is a civic argument. The health of the practitioner community is a commons problem. If everyone makes individually rational choices to chase the frontier, the collective result is a community where nobody has deep expertise. If enough people make the choice to build durable foundations, the collective result is a healthier ecosystem. Daniel's position is essentially a call for that kind of collective commitment.
And the ideological allies we've been discussing are the institutions and organizations that are, for various reasons, making that collective commitment already — whether because of their research culture, their structural incentives, their regulatory obligations, or their genuine philosophical alignment with the view.
The interesting thing is that they form a coalition without being a coalition. They're not coordinating. They're just independently arriving at similar conclusions about what sustainable AI development looks like. And that convergence is actually more robust than a coordinated movement would be, because it's not dependent on any single actor.
That's a good place to land. Okay, practical summary. Daniel's ideological allies in the "sustainable pace, expertise accumulation, standardization" camp: Anthropic as the philosophical anchor, Mistral for methodical open-weight releases, Meta for the open-weight commons, Hugging Face for standardization infrastructure, AI2 and Eleuther for rigorous research institutions, the enterprise software ecosystem for demand-side stability pressure, and the regulatory bodies for institutional mandate. And the real action for practitioners is building expertise at the stable infrastructure layer rather than trying to track the frontier.
That's a solid map. I'd add one thing: the community of practitioners who are writing carefully about what they're building and why — the people publishing thoughtful post-mortems, detailed technical analyses, honest evaluations. That community is also an ideological ally, even if it's not an institution. It's the knowledge commons being built in public.
The practitioner literature as a form of collective expertise accumulation.
Exactly — wait, I can't say that. The practitioner literature as a form of collective expertise accumulation. Yes.
Noted. Alright. Big thanks to our producer Hilbert Flumingtop for putting this one together — this is the kind of question that sounds like a position piece until you start pulling on it and realize it's actually a map of the whole ecosystem. Good stuff. And a quick thanks to Modal for the compute that keeps this pipeline running — if you're doing serverless GPU work, they're worth a look. This has been My Weird Prompts. You can find all two thousand one hundred and fifty-eight episodes at myweirdprompts.com, and if you want to keep up between episodes, we're on Telegram. Until next time.
Until next time.