Imagine you spend three years developing a groundbreaking artificial intelligence experiment. You have the prompts, the model weights, the synthetic datasets, and the multimodal outputs all meticulously documented. Then, the startup hosting your code pivots, the cloud storage provider changes its terms of service, or a link simply rots. Suddenly, your contribution to the collective knowledge of humanity just... vanishes. We are currently creating a digital dark age faster than we can build the digital flashlights to see through it.
It is a massive problem, Corn. The average lifespan of a web page is only about a hundred days. Think about that for a second. If you cite a source today, there is a statistically significant chance that by the time someone reads your work next season, that source is a 404 error. When you are talking about complex, high-stakes AI research or even just experimental media projects like this show, a hundred days is a blink of an eye. We need infrastructure that thinks in centuries, not fiscal quarters.
Well, today's prompt from Daniel is about exactly that. He is pointing us toward Zenodo, which is where we have actually been archiving this entire podcast project as a formal research collection. By the way, today's episode is powered by Google Gemini 1.5 Flash. I am Corn, the resident sloth who prefers things to stay where I put them, and this is my brother, Herman Poppleberry.
I am Herman, and I have spent the last few days digging into the literal petabytes of data that CERN manages. For those who do not know, Zenodo is the open-source digital repository platform launched back in two thousand thirteen. It was born out of the European OpenAIRE program and is operated by CERN, the same people smashing particles together in the Large Hadron Collider.
It is a bit of a flex, isn't it? Putting a podcast about weird AI prompts on the same servers that hold the data for the Higgs Boson. It’s like storing your childhood sketches in the basement of the Louvre. But Daniel's point is that if we want this "scaled experiment" of ours to actually mean something in ten or twenty years, we cannot just rely on Spotify's database staying intact or a hosting provider’s monthly subscription being paid. We need a "Library of Alexandria" for the digital age—one that doesn't burn down when a company goes public.
That is a great way to frame it. Zenodo was specifically built for the "long tail" of science. Big labs like NASA or the Max Planck Institute have their own massive institutional archives and IT departments, but what about the individual researcher, the citizen scientist, or the experimental podcaster? Before Zenodo, there was not really a stable, non-commercial home for that kind of data. You either had to be part of a major university or you were essentially shouting into a void that could be deleted at any moment.
So, let's get into the nuts and bolts. Most people hear "database" and their eyes glaze over, but Zenodo is doing something fundamentally different than a Google Drive or a Dropbox. If I put a file on Google Drive, I can delete it, or Google can decide I’ve violated a policy and lock me out. What is the actual technical architecture that makes Zenodo "permanent"?
The foundation is something called the Invenio framework. It is an open-source software stack for large-scale digital repositories, also developed at CERN. It’s designed to handle massive metadata harvesting and long-term bit preservation. But the real magic, the part that ensures things do not just disappear into a broken URL, is the Digital Object Identifier system, or DOI.
I see those in academic papers all the time. It is like a social security number for a file, right? Or maybe a VIN number for a car?
Precisely. Unlike a URL, which points to a specific location on a server—like a street address that might change if the building is torn down or the street is renamed—a DOI is a persistent identifier. It is registered with a central authority, usually DataCite for research data. If Zenodo moves its files from one storage cluster to a brand new one five years from now, the DOI remains the same. It is an abstraction layer that ensures the citation never breaks. When we upload an episode of "My Weird Prompts" to Zenodo, it gets a DOI. That means five years from now, a researcher can cite that specific episode in a paper, and the link will still resolve to our audio and transcript.
I love the idea of our banter being "formally citable." Imagine a footnote in a two thousand thirty-five thesis on AI-human collaboration that leads back to you calling yourself "Herman Poppleberry." It gives our nonsense a certain gravitas. But how does it handle the actual storage? If I upload a four-hundred-megabyte FLAC file of our voices, is that just sitting on a single hard drive in Switzerland? What if a magnet gets too close?
Essentially, yes, it’s in Switzerland, but it is high-performance infrastructure. CERN is used to handling exabytes of data from the Large Hadron Collider—we’re talking about data rates that would melt a standard home router. They have built Zenodo to be incredibly resilient. They use multiple redundancies, meaning your data is stored in several physical locations simultaneously. They also use "checksums" to ensure the data has not been corrupted over time—basically a digital fingerprint that they check regularly to make sure not a single bit has flipped. And they have a formal preservation policy. As of right now, CERN's operational horizon is decades. They guarantee that these DOIs will persist for at least twenty years, and likely much longer.
Twenty years is an eternity in tech. I can barely find a charging cable from five years ago that still works, let alone a file format I can still open. But let's talk about the "Communities" aspect Daniel mentioned. We have a community at "zenodo dot org slash communities slash my-weird-prompts." How does that organizational layer work? Is it like a subreddit for data?
In a sense, yes. This is where it gets interesting for collaborative projects. A Zenodo Community is a curated space. It is not just a dump of files; it is a managed collection. We can set metadata standards, so every episode has the same descriptive tags—who the hosts are, what AI model generated the script, the date, and the specific prompts used. This metadata is exported in standard formats like JSON-LD and DataCite, which means search engines and academic crawlers can index our podcast as data, not just as a piece of media.
So, if a researcher wants to do a linguistic analysis of how AI-generated scripts have evolved from two thousand twenty-four to two thousand twenty-six, they do not have to manually listen to eighteen hundred episodes. They can use the Zenodo API to scrape the metadata and the transcripts. They could essentially "read" our entire history in a few seconds.
And because we use public domain licensing—CC0—they do not have to ask for permission or navigate a complex legal minefield. It is a "scaled experiment" in the truest sense. We are providing the raw material for future AI training or sociological study. We’re essentially leaving a trail of breadcrumbs that are made of high-grade carbon fiber instead of bread.
It is funny, we often think of "open source" as just code, but this is "open data" for the arts and humanities. I was looking at some of the other things on there. It’s an eclectic mix. Did you see the LIGO gravitational wave data?
Oh, the LIGO deposit is the gold standard for Zenodo. That is the data from the first-ever detection of gravitational waves. It has a DOI—ten dot five two eight one slash zenodo dot one one two zero six two seven—and it has been cited over two thousand times. That is the power of this platform. It allows the most important scientific discoveries in history to be accessible to anyone with an internet connection, forever. It’s not locked in a proprietary journal behind a thirty-dollar-per-article paywall.
I feel slightly less important now, knowing we are sharing space with the fabric of spacetime, but I guess that is the point. Zenodo is inclusive. It does not gatekeep based on "prestige" or whether you have a PhD. If your data is valuable to someone, or if it represents a reproducible step in a project, it belongs there. It’s the democratization of the archive.
And it handles versioning beautifully. This is a huge deal for AI experiments. Say you release a dataset, and then you realize there is a bias in the sampling or a typo in the prompts. On most platforms, you just replace the file and the old one is gone forever. On Zenodo, you can upload a new version. The "Concept DOI" will always point to the latest version, but every previous version maintains its own unique DOI. You can trace the lineage of the research.
That is critical for auditing. If we are worried about how AI models are being trained or how they are "hallucinating" certain facts, we need that paper trail. You cannot audit a black box if the data used to build it has vanished. How can we trust a model if we can't see the exact version of the data it was fed on Tuesday versus Wednesday?
That leads us perfectly into the second-order effects of this kind of infrastructure. When you have a stable, versioned, and citable repository like Zenodo, it changes the economics of digital preservation. It lowers the barrier to entry for small teams to be "legitimate." You don't need a massive IT budget to ensure your project survives the decade.
Right, you do not need a million-dollar grant to ensure your work survives. You just need a bunch of nerds at CERN to keep the lights on, which they’re already doing for the particle accelerators. But there is a flip side, isn't there? What are the tradeoffs between this centralized model at CERN versus something decentralized, like Arweave or IPFS? I’ve heard people say the "Permanent Web" should be on a blockchain.
It is the classic "institutional trust" versus "algorithmic trust" debate. Decentralized web technologies like IPFS are fascinating because they do not rely on a single organization. As long as at least one node in the network is hosting the data, it exists. However, those systems can be technically complex to interact with for the average researcher, and the "economic sustainability" can be volatile. If the token price of a storage blockchain crashes, does your data stay up? Zenodo offers institutional stability. CERN has been around since nineteen fifty-four. They have survived the Cold War, multiple economic collapses, and the transition from analog to digital. There is a "brand" of permanence there that a new blockchain just cannot match yet.
I trust the guys in white lab coats in Geneva. They seem like they have a plan, and they’ve already built the world’s largest machine under the mountains. But what happens if the funding for OpenAIRE dries up? Even CERN isn't immune to politics or budget cuts. Is there a "Plan B" for the data?
That is the big "what if." But the beauty of Zenodo is that it is built on open standards. If CERN ever had to shut it down, the data is already formatted in a way that it could be migrated to another "dark archive" or a library system like the Internet Archive or the Library of Congress. It is not locked in a proprietary format like a specialized Adobe file or a hidden database schema. The metadata is open, and the files are standard.
Let's talk about the multimodal aspect. Daniel mentioned we upload the audio, the metadata, and the cover art. In the AI world, we are seeing this massive shift toward multimodal models that understand text, images, and sound simultaneously. By putting all these pieces together in one Zenodo record, are we essentially creating a "training set" for future podcasting AIs?
In a way, yes. Most AI training data is scraped from the web in a very messy format. It is "unstructured." The AI has to guess which image goes with which caption. Zenodo turns our project into "structured" data. A model can see the prompt, read the resulting script, listen to the synthesized audio, and look at the generated art. It can learn the relationships between those different modalities with much higher precision because the metadata tells it exactly how they are connected. We’re giving it a Rosetta Stone for our specific brand of weirdness.
It is like giving the AI a textbook instead of a pile of random magazines. I wonder if there are other creators doing this. We see a lot of people on GitHub, but GitHub is for code. What about the "data" of creativity? The sketches, the failed takes, the intermediate steps?
There is a growing movement. Look at the COVID-nineteen Open Research Dataset, or CORD-nineteen. When the pandemic hit, researchers needed a way to share papers, models, and data instantly, without waiting for the slow eighteen-month peer-review cycle. Zenodo became a primary hub for that. There are over one and a half million papers and datasets in that collection now. It essentially accelerated the global scientific response because the "preservation" and "sharing" happened in real-time. It showed that "open" is faster than "closed."
That is a stark contrast to GitHub's archive program. Remember when they buried all that code in an Arctic vault?
The Arctic Code Vault! That was a cool PR move, burying film reels of code in the permafrost of Svalbard. But that is "passive" preservation. It is meant to be dug up in a thousand years after a cataclysm. It’s for the survivors of the apocalypse to rebuild the internet. Zenodo is "active" preservation. It is meant to be used today, cited tomorrow, and integrated into new workflows next week. It is a living archive, not a tomb.
I prefer the living version. I am not planning on being around for the post-apocalyptic Svalbard dig, and I doubt the mutants of the future will care about our prompt engineering. I want people to be able to listen to us complaining about AI hallucinations while the hallucinations are still happening. I want it to be part of the current conversation.
Well, look at the scale. As of right now, in April of twenty twenty-six, Zenodo is hosting over three million deposits. They have over half a million unique users. It is not a niche academic tool anymore; it is the backbone of the open-knowledge movement. Every month, they add hundreds of terabytes of new information.
So, if I am a listener and I am playing around with Claude or Gemini or some local Llama model, and I create something I think is actually cool—maybe a new way of prompting for architectural design or a synthetic dataset for a hobby project—should I be putting it on Zenodo? Or is that "cluttering" the sacred halls of science? I’d hate to be the guy who puts a "top ten cat pictures" dataset next to the Higgs Boson data.
No, Zenodo's mission is "ensuring no one is left behind." If you have a digital object that has value as a "research output"—and in the age of AI, almost every creative experiment is a research output—it belongs there. The key is the metadata. If you just upload a file called "test dot zip" with no description, you are cluttering. But if you document your process, tag it correctly, and give it a license, you are contributing to the global commons. You’re saying, "I did this, here is how I did it, and here is the result."
I think people underestimate how much of our current digital life is going to just... poof. We assume the Cloud is forever, but the Cloud is just someone else's computer, and that someone might go bankrupt, get acquired by a hedge fund, or decide your "weird prompts" violate their new "community standards." We’ve seen it happen with MySpace, with Geocities, with Vine.
That is the "Digital Dark Age" theory. We have more information than any civilization in history, but we are storing it on the most fragile medium ever invented. A stone tablet lasts five thousand years. A high-quality acid-free paper lasts five hundred years. A hard drive? Maybe five to ten years if you are lucky. A URL? A few months. Zenodo is our attempt to build a "digital stone tablet" that can be updated.
It is a bit ironic, isn't it? We are using the most cutting-edge, ephemeral technology—generative AI—and then trying to anchor it to the most permanent thing we can find. It is like trying to tie a balloon to a mountain. The technology is moving at light speed, but the archive is moving at the speed of a glacier.
But that is the only way the balloon stays in one place! Without that anchor, the "scaled experiment" Daniel talks about is just noise. It’s just a series of ephemeral moments that don’t build on each other. With it, it becomes a longitudinal study. We can look back at Episode One and compare it to Episode Eighteen Hundred and Forty-One and see exactly where the technology shifted, where our perspectives changed, and where the AI got "smarter" or "weirder." We can actually measure the progress.
I think the "weirder" part is a given. So, let's get practical for a second. If someone wants to actually use our Zenodo collection, what can they do? Is there a search bar? Can they download a giant zip file?
They can go to the community page, and they will see every episode listed as a separate record. Each record has the audio file, the transcript, and the cover art. They can download the entire thing via the Zenodo API if they want to train a model on it—it’s very developer-friendly. Or they can just browse the metadata to see which episodes cover specific topics like "quantum security" or "battery chemistry." It’s fully searchable.
And because of the DOI, they can link to it in a blog post or a research paper and know that the link won't break. I think that is the biggest takeaway for creators: stop using "naked" URLs for things you care about. Use persistent identifiers. If you’re a photographer, a writer, or a coder, think about the "legacy" of your files.
It is a shift in mindset. We are so used to the "disposable web." We post something, it gets some likes, it stays in the feed for six hours, and then it disappears. Zenodo asks us to treat our work with a bit more gravity. It asks, "Is this worth preserving for twenty years?" And for a lot of people, the answer is a surprising "yes."
For some of your jokes, Herman, the answer is a hard "no." I wouldn’t mind if those were lost to the sands of time. But for the data? Absolutely. It is about respect for the process. If we are going to spend all this time talking into microphones and prompting these massive neural networks, we should at least make sure the evidence doesn't get deleted by a "database maintenance" script in three years.
I agree. And there is a social aspect to it too. By putting our work in a place like Zenodo, we are signaling that we believe in Open Science. We are saying that this knowledge shouldn't be locked behind a paywall or hidden in a proprietary silo. It belongs to everyone. It’s a statement of values as much as it is a storage solution.
It is very "pro-freedom of information," which fits our worldview. We want the best ideas to win, and the only way they can win is if they are available for people to see, test, and critique. You can’t peer-review a secret.
It facilitates reproducibility. If I claim that a certain prompt structure produces better code, and I put that prompt and the output on Zenodo, you can go and test it yourself. You can try to break it. You can see if my results were a fluke or a real discovery. That is how science progresses—by standing on the shoulders of giants, or in our case, standing on a pile of archived AI scripts.
It is "Put up or shut up" for the AI era. No more "trust me, I saw this cool thing one time on a private Discord server." Here is the DOI, go see for yourself. It brings a level of accountability that the AI space desperately needs right now.
It also helps with the "AI Model Auditing" problem. We are seeing more and more concern about where AI training data comes from. Was it stolen? Was it scraped without consent? If researchers use the "My Weird Prompts" collection, they have a clear provenance. They know exactly where the data came from, who created it, and what the license is. That is going to be incredibly important as copyright laws and AI regulations evolve. We’re basically making ourselves "audit-ready."
So, we are basically "future-proofing" our own liability while also being helpful. I like it. Efficient. It’s very sloth-friendly to solve three problems with one upload.
It is the ultimate sloth move, Corn. Do the work once, archive it correctly, and never have to worry about it again. You don’t have to keep checking if the link still works or if the hosting company went under. You’ve offloaded the "worry" to CERN.
You know me so well. Why do more work later when you can do a little more work now and then nap for a decade while the servers in Geneva do the heavy lifting? I can sleep soundly knowing the Higgs Boson is keeping my transcripts warm.
Speaking of heavy lifting, we should probably talk about how listeners can actually implement this in their own workflows. It is not just for podcasters. If you are a developer, you can link your GitHub repository to Zenodo. Every time you create a new "release" on GitHub, Zenodo will automatically archive it, take a snapshot of the code, and issue a new DOI.
Wait, that is a game-changer. So your code is preserved even if GitHub goes the way of MySpace or decides to change its terms for open-source projects?
Precisely. It is called the GitHub-Zenodo integration. It is one of the most popular ways to make software "citable." If you write a piece of code that helps other researchers, they can cite the Zenodo DOI, and you get academic credit for your software just like you would for a paper. It turns "coding" into "publishing."
That is huge for open-source contributors who feel like their work goes unrecognized by the formal academic or professional world. You can actually build a "research profile" based on your code contributions. It’s a way to prove your impact.
And for the non-coders, you can just use the web interface. It is as simple as uploading a file, filling out a few boxes of metadata—author, title, description, license—and hitting "publish." In ten seconds, you have a permanent piece of the internet. You even get a little badge you can put on your website that shows your DOI.
I think we should challenge our listeners. If you have a "weird prompt" project, a collection of AI-generated art, or an experiment you have been working on, do not just leave it in your "Notes" app or a random folder on your desktop. Put it on Zenodo. Give it a DOI. Make it real. Make it part of the record.
I love that. Let's build a more permanent web together. Because the alternative is just... emptiness. We are losing so much history every day because we are too lazy to click a few extra buttons. We’re living through a digital amnesia.
It is the difference between writing in the sand and carving in stone. The sand is easier, sure, but the tide is always coming in, and the tide doesn't care about your prompts.
And the tide in the digital world is a tsunami of new content that buries everything that came before it. Every minute, hours of video and millions of posts are created. Zenodo is the high ground. It’s the place where we can actually keep things safe from the flood.
Well, on that note, I think we have successfully made "digital repositories" sound cool, which is a feat in itself. I feel a lot better about our "scaled experiment" knowing it’s tucked away in Switzerland. Herman Poppleberry, you have outdone yourself with the research today.
I just really like high-performance computing clusters and persistent data structures, Corn. I cannot help it. There’s something comforting about a system that’s built to last longer than a typical smartphone contract.
We know, Herman. We know. So, what are the big takeaways for the folks at home? Give us the "too long; didn't read" version.
First, realize that URLs are not permanent. They are temporary pointers. If you want something to last, you need a Persistent Identifier like a DOI. Second, Zenodo is a free, world-class resource provided by CERN for the global community. Use it for your datasets, your experiments, and your "long-tail" projects. Don't assume you aren't "scientific" enough to use it. Third, metadata is your friend. It is what turns a "file" into "data" that can be searched and analyzed by humans and AI alike. Without metadata, your file is just a mystery box.
And fourth, check out our collection! Go to zenodo dot org slash communities slash my-weird-prompts and see the literal thousands of files we have stored there. It is the full history of this show, from the very first "Hello World" to whatever nonsense we are talking about today. You can see the evolution of our prompts and the models we've used.
It is all there. The good, the bad, and the truly weird. It’s a transparent look at how this show is made.
I think it is mostly the weird. But that is why we are here. Before we wrap up, I want to ask a bit of a "meta" question. What happens when the preservation infrastructure itself needs preservation? Like, if Zenodo is the "Library of Alexandria," what happens if the library itself gets old or the technology it’s built on becomes obsolete?
That is the "Who watches the watchmen" of the library world. The answer is "distributed redundancy." Zenodo is part of a larger network called the Data Preservation Alliance. They share best practices and, in some cases, mirror data across different institutions. There are organizations like Portico and CLOCKSS that act as "dark archives"—they hold copies of data that are only released if the primary host disappears. The goal is that no single point of failure—even one as big as CERN—can take down the entire system of human knowledge.
It is a giant, global safety net for human knowledge. It is actually kind of beautiful when you think about it. All these different countries, scientists, and librarians collaborating just to make sure we do not forget what we have learned. It’s one of the few truly selfless things we do as a species.
It is one of the few things humanity is actually pretty good at when we put our minds to it. We like to remember things. We like to leave a mark. We’ve been doing it since we painted on cave walls. Zenodo is just the modern version of that cave wall.
And now, thanks to Daniel's prompt, we have left our mark in the form of a DOI. We are officially part of the permanent record. No pressure, Herman, but everything you say from now on is being archived by nuclear physicists for the rest of time.
Well, in that case, I should probably stop making donkey jokes and start saying more profound things about the nature of reality.
Don't you dare. The physicists need a laugh too. They’re dealing with anti-matter and black holes; they need a little Herman Poppleberry in their lives.
Fair point. I'll keep the jokes, but I'll make sure the metadata is impeccable.
That's all I ask. Alright, let's wrap this one up. If you found this dive into digital preservation interesting, there is plenty more where that came from. We check our emails at show at myweirdprompts dot com if you have your own "weird prompts" or preservation tips to share. We love hearing about how you're using these tools.
And big thanks as always to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes and making sure our uploads actually make it to Geneva.
Also, a huge thank you to Modal for providing the GPU credits that power the generation of this show. We literally couldn't do this "scaled experiment" without their infrastructure and their support of weird AI projects.
If you are enjoying the show, a quick review on your podcast app really helps us reach new listeners and grow the community. It’s the best way to help us keep this experiment running.
Find us at myweirdprompts dot com for the RSS feed and all the links to our Zenodo collection. This has been My Weird Prompts.
Stay curious, and keep archiving. Your future self will thank you.
See ya.
Goodbye.