#1478: Who Owns the Truth? The Evolution of the Encyclopedia

From ancient Chinese archives to the legal war between Britannica and OpenAI, we explore the shifting battleground of human knowledge.

0:000:00

Episode Details

Published: Mar 23
Duration: 19:00
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The definition of truth is currently being litigated in federal court. In March 2026, Encyclopedia Britannica and Merriam-Webster filed a massive copyright infringement suit against OpenAI, claiming that the scraping of nearly 100,000 articles for AI training is cannibalizing their brand. This legal battle marks a turning point in the history of how humans organize, store, and access information.

From Imperial Archives to Subversive Manuals

The quest to catalog all human knowledge is not new. In 1403, the Ming Dynasty’s Emperor Yongle commissioned the Yongle Dadian, a staggering collection of over 11,000 volumes. It was a "single point of truth" controlled entirely by the state to ensure imperial legitimacy. However, its physical nature made it fragile; today, only 3% of the original work remains after centuries of fire and conflict.

By the 18th century, the philosophy of knowledge shifted from the palace to the public. Denis Diderot’s Encyclopédie was a radical departure from state-sponsored works. By including technical manuals on manual trades and "mechanical arts," Diderot democratized authority, suggesting that a blacksmith’s skill was as vital to society as a priest’s theology. This was so subversive that the French monarchy eventually banned the project, forcing it into an underground existence.

The Rise of the Algorithm

Today, the "gatekeeper" role has shifted from human editors to algorithms. While Wikipedia democratized knowledge through its "anyone can edit" model, it now faces an epistemic crisis. Large Language Models (LLMs) like ChatGPT synthesize Wikipedia’s data into instant summaries, leading users away from the original sources.

The Britannica lawsuit highlights a core tension: AI models often rely on the "curatorial judgment" and authority of legacy institutions while simultaneously threatening their business models. Furthermore, the risk of "hallucinations"—where AI generates false facts and attributes them to reputable sources—threatens to corrupt the historical record itself.

The Future of Knowledge Standards

As the reliability of AI-generated summaries is questioned, several new models are emerging to fix the "gatekeeper" problem. Grokipedia attempts to use AI to strip away human bias, though critics argue it merely replaces human bias with algorithmic bias. Meanwhile, Scholarpedia returns to a more traditional model, utilizing peer-reviewed articles written by invited experts and Nobel laureates to ensure academic rigor.

Perhaps the most ambitious project is the Encyclosphere. Rather than a single website, the Encyclosphere is a decentralized protocol. Much like email, it allows various encyclopedias to communicate without a central authority deciding which information is "notable." By removing the gatekeeper entirely, it aims to create a neutral, ownerless network for global knowledge.

Whether through legal battles or new technical protocols, the way we define and defend the truth is entering a volatile new chapter. The struggle remains the same as it was in the 15th century: deciding who has the right to organize reality.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1478: Who Owns the Truth? The Evolution of the Encyclopedia

Daniel's Prompt

Custom topic: Let's talk about the history of encyclopedias. A few of them have been created over time that i've been particularly famous, and they always entailed some degree of a question over what type of inform

It is a strange time to be alive when the definition of truth is being litigated in a Manhattan federal court. We have more data at our fingertips than any civilization in history, yet the gatekeepers of that knowledge are currently in a high-stakes standoff with silicon. Today's prompt from Daniel is about the history of encyclopedias and the shifting battleground of human knowledge, and it could not be more timely.

It really is a fascinating moment. I am Herman Poppleberry, and I have been refreshing the court dockets all week. Just ten days ago, on March thirteenth, twenty twenty-six, Encyclopedia Britannica and Merriam-Webster filed a massive copyright infringement suit against OpenAI. They are claiming that the scraping of nearly one hundred thousand articles for training ChatGPT is essentially cannibalizing their entire brand. It is not just about the data; it is about the authority that Britannica has spent centuries cultivating.

It feels like the ultimate irony. Britannica spent two hundred and fifty years building a reputation for being the final word on everything, only to have that authority vacuumed up by a model that sometimes thinks there are three Rs in the word strawberry. Daniel wants us to look at how we got here, moving from state-sponsored archives to this chaotic era of AI-generated summaries.

The thing we have to realize is that an encyclopedia was never just a collection of books. It has always been a power structure for organizing reality. If you control the list of what is notable, you control what history remembers. When you look at the legal filing from March thirteenth, you see Britannica arguing that their "curatorial judgment" is what is being stolen. They are saying that the way they decide what matters is their intellectual property.

That is a heavy way to start, but I suppose it is true. Before we get into the modern legal drama, we should probably talk about the first time someone tried to write down literally everything. Daniel mentioned the origins of the large-scale encyclopedia, and that takes us back to the Ming Dynasty, right?

The Yongle Dadian is honestly one of the most staggering intellectual achievements in human history. It was commissioned by Emperor Yongle in fourteen hundred and three and finished in fourteen hundred and eight. We are talking about twenty-two thousand, nine hundred and thirty-seven chapters across eleven thousand and ninety-five volumes. To put that in perspective, if you stacked those volumes, they would reach the height of a ten-story building.

I can barely keep my desk organized, and this guy managed to coordinate two thousand scholars to categorize all of Chinese knowledge. How do you even manage a project like that in the fifteenth century?

It was a monumental effort of human indexing. They called it a leishu, which translates to categorized writing. But it was not an encyclopedia in the sense of a modern summary. It was more of a massive anthology. They did not rewrite the information; they transcribed entire works into a new, categorized system. It covered everything from the Confucian canon and history to the arts, medicine, and even manual trades. For five hundred and ninety-nine years, it remained the largest encyclopedia in the world until Wikipedia finally surpassed it in two thousand and seven.

But there is a catch with the Yongle Encyclopedia, isn't there? It was not exactly a public resource. It was not like a Ming Dynasty citizen could just walk into a library and look up how to fix a leaky roof.

Not at all. It was the ultimate state-sponsored single point of truth. It was meant for the Emperor and his top officials to ensure the continuity of their worldview. It was a tool of imperial legitimacy. And the tragedy is that today, only about four hundred volumes still exist. That is a ninety-seven percent loss rate. Most of it was destroyed during the Boxer Rebellion in nineteen hundred and various fires over the centuries. We talked about this briefly in episode ten thirty-two when we looked at how history survives deletion, but the Yongle Dadian is the ultimate example of a single point of failure. When you have one physical copy owned by one government, a single fire can erase centuries of thought.

It is wild to think that the greatest collection of knowledge for six centuries is mostly gone. It was top-down, royal knowledge. But then we get to the eighteenth century, and the whole philosophy of what is worth knowing starts to shift. We move from the Emperor's palace to the coffee houses of Paris.

That brings us to Denis Diderot and Jean le Rond d'Alembert. In seventeen fifty-one, they started the Encyclopedie in France, and this was a radical departure. Before this, scholarly works focused on the divine, the royal, and the abstract. Diderot decided that the mechanical arts—the trades, the tools, the actual manual labor of the middle class—were just as important as philosophy or theology.

This was basically a middle finger to the Catholic Church and the French monarchy at the time. Including a diagram of how a printing press works or how a blacksmith hammers iron was seen as revolutionary. Why did that cause such a political firestorm? It seems so mundane to us now.

Because it democratized authority. In the eighteenth century, knowledge was a secret guarded by guilds and the church. If you give people a technical manual for the world, you are telling them they do not need a priest or a king to explain how things work. It prioritized human reason over divine revelation. Diderot famously said that the goal of an encyclopedia is to change the way people think. King Louis the fifteenth actually banned it in seventeen fifty-nine. They had to continue printing it in secret for years, smuggling volumes out like contraband.

So we went from the Emperor's private archive to a subversive underground project for the people. That feels like the first step toward the decentralized mess we have today. Diderot was the original disruptor.

He really was. Diderot was the first to grapple with the concept of notability in a way that felt modern. He wanted to include everything that was useful. But as we see in the current crisis with Wikipedia and AI, the definition of useful is where all the fighting happens. Diderot’s struggle was against the church; today’s struggle is against the algorithm.

Well, let's jump to the current day because Wikipedia is the elephant in the room. As of March twenty twenty-six, English Wikipedia has over seven point one million articles. It is the fifth most visited site on earth. But Daniel's prompt points out that we are seeing a shift in the era of Wikipedia. There is this growing tension between the democratic "anyone can edit" model and the need for rigorous, depoliticized information.

The Wikipedia model is facing an epistemic crisis. We went deep on this in episode twelve ninety-eight, looking at the internal wars over digital truth. But since then, the pressure from AI has changed the landscape entirely. When you have a model like ChatGPT that can synthesize information instantly, the value of a Wikipedia article changes. People are not just reading the source anymore; they are reading the AI summary of the source.

And that is exactly what the Britannica lawsuit is about, right? They are saying that OpenAI is basically wearing Britannica's skin. They take the verified, expert-written data, mash it through a transformer, and spit out a summary that looks authoritative but lacks the institutional accountability. Britannica is essentially arguing that their seventy-four dollar and ninety-five cent annual subscription is being bypassed by a machine that does not pay for the research.

Britannica is in a tough spot. They ended their print edition in twenty twelve and pivoted to a subscription model. They charge seventy-four dollars and ninety-five cents a year, mostly targeting schools and libraries. Their argument is that if OpenAI provides a summary of a Britannica article, the user never clicks through to the Britannica site. The AI is literally eating the business model of the experts it relies on. It is a parasitic relationship.

Plus, there is the hallucination problem. If an AI generates a fake fact and attributes it to Britannica, it ruins a two-hundred-and-fifty-year-old brand. I saw a report where an AI claimed Britannica said a specific historical figure was a lizard person. You can see why their lawyers are staying busy. It is not just about lost revenue; it is about the corruption of the record.

What is interesting is how the market is responding to this perceived bias and lack of rigor. We are seeing these major modern projects that Daniel asked about—things that aim to fix the problems Wikipedia created. One of the biggest and most controversial ones is Grokipedia.

Oh, I have been watching the Grokipedia expansion. It reached version zero point two recently, didn't it?

It did, right at the start of twenty twenty-six. It has over six million articles now. Elon Musk is marketing it as a non-woke alternative to Wikipedia. The technical claim is that it uses artificial intelligence to verify facts more objectively than human editors, who might have political biases. But the reality is a bit messier. Critics have pointed out that Grokipedia often relies on a very specific subset of sources and still makes massive factual errors because, at the end of the day, it is still an LLM-driven project. It is trying to solve human bias by introducing algorithmic bias.

It is the same problem in a different hat. You are replacing a committee of humans with a black box of code. I am more interested in the projects that are trying to bring back the experts. Like Scholarpedia. That feels like a return to the Diderot model but with modern speed.

Scholarpedia is a great example of a hybrid model. It looks like a wiki, but you cannot just jump in and edit it. Articles are written by invited experts—we are talking Nobel laureates and field-leading researchers. Each article undergoes a formal peer review, and once it is published, the author remains the curator. It combines the speed of a digital platform with the rigor of an academic journal. It is the antithesis of the "move fast and break things" AI model.

But it is slow. You cannot have seven million articles in Scholarpedia because there are not enough Nobel laureates to go around. This brings us to the big technical shift Daniel mentioned: the Encyclosphere. This feels like the most ambitious attempt to solve the gatekeeper problem.

This is where it gets really nerdy and exciting. The Knowledge Standards Foundation, which was founded by Larry Sanger—who was a co-founder of Wikipedia before he became its biggest critic—is building something called the Encyclosphere. They just got a hundred-thousand-dollar grant from FUTO in February twenty twenty-six to expand it.

I love the idea of the Encyclosphere because it is not a website. It is a protocol. Explain that distinction for us, Herman.

That is the key. Instead of one big tech gatekeeper like Wikimedia or Google or OpenAI, the Encyclosphere aims to link all global encyclopedias into a single, ownerless network. Think of it like the way email works. No one owns the concept of email; you just have different providers that talk to each other using the same standards. The Encyclosphere wants to do that for knowledge.

So if I write an article on a small, niche encyclopedia for nineteenth-century steam engines, and someone else writes one on a general encyclopedia, the Encyclosphere protocol lets a user search both at once without a central authority deciding which one is more notable.

It removes the gatekeeping. Larry Sanger's whole point is that Wikipedia has become a centralized power structure where a small group of editors decides what the neutral point of view is. By decentralizing it, you allow for multiple perspectives to exist in a federated way. You could have a Britannica entry, a Wikipedia entry, and a Scholarpedia entry all appearing side-by-side for the same topic.

It sounds great in theory, but how do you handle the truth part? If I start the "Corn Is A Genius Encyclopedia" and link it to the network, does that mean my fake facts are now part of the global knowledge base? How do we avoid a total collapse of shared reality?

That is where the Knowledge Standards Foundation is working on metadata and reputation systems. The idea is that you can filter your feed. You could say, "I want to see information from sources that are signed by verified experts," or "I want to see the consensus view from five different independent encyclopedias." It puts the power of filtering back in the hands of the user rather than the platform. It is a move from "The Encyclopedia" to "The Encyclosphere."

It feels like we are moving toward a world where we have to be our own editors. But I wonder if the average person actually wants to do that much work. Most people just want to ask a chatbot a question and get an answer while they are making toast.

That is the danger. We are trading accuracy for convenience. Britannica's lawsuit argues that this convenience is built on theft, but more importantly, it is built on a lack of transparency. When an AI gives you an answer, you do not see the edit history. You do not see the debate between scholars. You just see a confident block of text. We are losing the "proof of work" that makes knowledge reliable.

There is also Everipedia to consider. They are trying a different approach using blockchain. They have these IQ tokens to incentivize accuracy.

Everipedia is an interesting experiment in game theory for knowledge. The idea is that you have a financial stake in the accuracy of the information you provide. If you contribute high-quality, verified content, you earn tokens. If you vandalize or provide false information, you lose your stake. It is an attempt to solve the edit war problem using economic incentives. It is a long way from Emperor Yongle's scholars sitting in a room in fourteen hundred and three, but the fundamental question Daniel raised is still there: who decides what is worth remembering?

And that brings us to the concept of the notability filter. In the old days, paper was expensive. You could only fit so much in a book, so you had to be picky. In the digital age, storage is essentially free, so we stopped being picky. But now, attention is the scarce resource. We are drowning in information, so the filter is more important than ever. If the filter is an AI that is optimized for engagement or political leanings, we are in trouble.

I think the most important takeaway for anyone navigating this right now is to look for the proof of work. Whether it is an expert-signed article on Scholarpedia, a peer-reviewed entry on Citizendium, or a well-sourced entry on a specialized wiki, you have to look for the accountability. Citizendium, for example, requires contributors to use their real names. That one simple rule changes the entire tone of the project.

I agree. We should be looking for federated networks rather than single sources. The future of knowledge is not going to be one big site that everyone agrees on. That is a myth that Wikipedia briefly made us believe was possible. The future is likely a network of verified, specialized sources that we can query and verify ourselves. We need to move away from the idea of a "single point of truth."

It is almost like we are going back to the era of Diderot, where you have these different pockets of radical, specialized knowledge, but now they are all connected by fiber optics instead of smuggled printing presses. We are seeing projects like Justapedia, which launched in twenty twenty-five, trying to provide a "neutral" alternative to what they see as Wikipedia's systemic bias. Whether they succeed or not, the fact that they exist shows that the monopoly on truth is breaking.

What I find wild is that even with all this tech, we are still fighting the same battles. The Britannica lawsuit is just a twenty-first-century version of the French monarchy trying to control who gets to define a printing press. They are trying to protect their authority in a world that is moving toward a more decentralized, algorithmic reality.

If knowledge becomes decentralized and ownerless, though, who is responsible when it is wrong? If a decentralized AI agent gives someone medical advice based on a federated network of questionable wikis, who do you sue? You cannot sue a protocol. You cannot put a blockchain in jail.

That is the big unanswered question. We are moving into a world where the responsibility for truth is shifting from the publisher to the consumer. It is a lot to ask of people. We are asking every citizen to be a historian, a scientist, and a fact-checker all at once.

It really is. I think we have covered a lot of ground here, from the Ming Dynasty to the Manhattan federal court. Daniel really knows how to pick a topic that makes me want to go lie down in a dark room and think about the nature of reality.

It is a lot to process, but it is better to be aware of the plumbing of our information than to just drink whatever comes out of the tap. We have moved from the printing press to the server farm, but the struggle for who gets to decide what is true remains the same.

It really does. And I suspect we will be seeing the results of that Britannica lawsuit ripple through the industry for years to come. If Britannica wins, it could change how every AI model is trained. If they lose, it might be the final nail in the coffin for the traditional expert-led encyclopedia model.

For sure. Thanks as always to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes.

And a big thanks to Modal for providing the GPU credits that power this show and allow us to explore these weird prompts every week.

This has been My Weird Prompts. If you are enjoying the show, a quick review on your podcast app really helps us reach new listeners and keeps the algorithm happy. You can also check our website for further reading on the Yongle Dadian and the Encyclosphere project.

We will see you in the next one.

Goodbye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.