#2301: Inside Podcasting's Simple, Powerful Infrastructure

Explore the elegant simplicity of podcasting’s RSS backbone and how it empowers creators with independence and control.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2459
Published: Apr 18
Duration: 1:00:49
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: internet content-provenance data-sovereignty

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Podcasting’s infrastructure is a marvel of simplicity and durability. At its core is the RSS specification, a decades-old standard that continues to power the distribution of audio content worldwide. This episode explores how RSS works, why it’s so effective, and how podcasters can take control of their feeds rather than relying on hosted platforms.

The RSS feed is a plain text XML file that contains all the metadata for a podcast, from episode titles and descriptions to audio file locations. Key elements include the <enclosure> tag, which specifies the audio file’s URL, size, and format, and the <guid> tag, which ensures episodes are uniquely identified. Apple’s iTunes namespace extensions add richer metadata like artwork, content warnings, and categories, while newer extensions from the Podcast Index project support features like transcripts and micropayments.

One of the most compelling aspects of RSS is its independence. By owning their feed, podcasters maintain control over their relationship with platforms like Spotify and Apple Podcasts. This contrasts with hosted platforms, where creators are often locked into specific tools and policies.

The episode also delves into the technical stack behind a modern podcast setup. Tools like Vercel handle deployment, while Cloudflare R2 provides scalable audio storage. These components work together to serve RSS feeds and audio files reliably and efficiently.

Analytics present a unique challenge for independent podcasters. Standard hosted platforms provide dashboards, but they often come with privacy trade-offs. Creators must decide what metrics are essential and how to gather them without invasive tracking.

Ultimately, podcasting’s infrastructure is a testament to the power of simplicity. Its enduring relevance highlights the importance of ownership, flexibility, and transparency—principles that resonate deeply with creators and listeners alike.

Mentions

Apple Podcasts Major podcast directory and player
Feedsmith Podcast feed validator and tools
IAB Podcast Measurement Guidelines Industry standard for podcast metrics
microfeed Open source self-hosted podcast CMS
Overcast Indie podcast app with smart features
Pocket Casts Popular independent podcast player
Podcast Index Open podcasting namespace and directory
Podscan.fm Podcast analytics and chart rankings
RSS 2.0 XML syndication format for podcast feeds
Spotify Music and podcast streaming platform

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2301: Inside Podcasting's Simple, Powerful Infrastructure

Daniel sent us this one, and it's a bit of a love letter to the infrastructure underneath podcasting itself. He's been an early adopter of the medium, says podcasting actually shaped a lot of his thinking on Judaism and Israel back when he was living in Ireland, niche content finding its audience before algorithms made that frictionless. Now he's pulling back the curtain on how My Weird Prompts actually runs: Vercel for deployment, Cloudflare R2 for audio storage, and a hand-rolled XML feed as the backbone. He wants to dig into the RSS specification for people who want to build this themselves rather than hand it off to some automated pipeline, talk through the serverless setup, and figure out how to get meaningful listener analytics without doing anything invasive. That last part he feels strongly about.

I'm Herman Poppleberry, and yes, I feel strongly about that last part too. There's a real tension in podcasting between knowing whether anyone is actually listening and not turning your feed into a surveillance apparatus. We'll get into that.

By the way, today's episode is powered by Claude Sonnet four point six.

Our friendly AI down the road. Right, so let's think about what podcasting actually is at the infrastructure level, because I think a lot of people who listen to podcasts have no idea how elegantly simple the underlying system is, and also how old it is.

It's genuinely ancient by internet standards. 0 as a specification is from two thousand three. The podcast extensions that Apple layered on top came a few years later. And the whole thing has just... While every other content distribution format has been deprecated, replaced, or acquired into irrelevance, an XML file sitting on a server is still how audio gets from a creator to someone's earbuds.

Which is either a testament to how well the spec was designed or a sign that nobody could agree on anything better.

And what Daniel is pointing at is that this simplicity is a feature you can actually exploit if you're willing to get your hands dirty. You don't need a hosting platform telling you what you can and can't do with your own show.

Right, and that independence has real consequences. Distribution to Spotify, Apple, Amazon, all of it flows through that XML feed. You own the feed, you own the relationship with those platforms. The platform doesn't own you.

Which is the thing that a lot of early adopters understood intuitively and that a lot of newer podcasters have completely forgotten because they signed up for a hosted service on day one and never thought about what's underneath it.

The technical stack Daniel is describing, Vercel handling the web layer, Cloudflare R2 holding the audio files, a custom XML feed stitching it together, that's a modern take on a very old pattern. And I want to spend real time on each of those pieces because the decisions you make at each layer have downstream consequences that aren't obvious until something breaks or you want to change something.

The analytics problem is interesting because it's not just a privacy question. It's an architectural question. When you step off the beaten path of hosted podcast platforms, you give up the dashboards those platforms provide, and then you have to decide what you actually need to know and how to find it out without doing anything you'd be embarrassed to explain to your listeners.

Which, for a show where the audience is people who care about this kind of thing, that matters a lot.

Alright, let's start at the foundation. The XML spec. Because I think even technically literate people sometimes treat it as a black box they copy from a template and never look at again.

Which is a shame, because once you actually read through an RSS feed, you realize it's almost self-documenting. The structure is right there. So the root element is the RSS tag with a version attribute set to two point zero. Inside that you have a single channel element, and the channel is where almost everything lives. You've got title, link, description, those three are mandatory. Then you've got language, which matters more than people think for international distribution, copyright, managingEditor, webMaster, pubDate, lastBuildDate. And then the items, which are your individual episodes.

The enclosure element is the one that actually makes podcasting work, right? That's the piece that tells a podcast client there's an audio file here, go fetch it.

The enclosure has three attributes and only three: url, length, and type. The url is where the audio file lives. Length is the file size in bytes, which is used by clients to estimate download time and storage. And type is the MIME type, so for an MP3 that's audio slash mpeg, for an AAC file it's audio slash x-m4a. That's it. Three attributes and you've told every podcast client on earth what it needs to know to download your episode.

It's almost offensively simple.

It really is. And then the podcast namespace extensions, primarily the iTunes namespace that Apple defined and that everyone else subsequently adopted, those add the richer metadata. The itunes colon image tag gives you your artwork URL. The itunes colon explicit tag handles content warnings. The itunes colon duration gives you the episode length in a human-readable format. The itunes colon category tells directories how to classify your show. And itunes colon author, itunes colon summary, those fill in the display information that directories show to potential listeners.

When someone is browsing Apple Podcasts or Spotify and they see your show's artwork, your category, your description, all of that is being pulled from your XML feed.

Every single bit of it. And this is where the ownership point becomes concrete. If Spotify is pulling your metadata from your feed, and you control your feed, you can update your artwork, change your description, modify your category, and it propagates to every platform that reads your feed. You're not going into each platform's dashboard and updating things separately.

Whereas if your hosting platform controls the feed, you're at their mercy for how that data gets presented.

You're also at their mercy if they go down, or if they change their pricing, or if they decide your content violates some policy you didn't know existed. The feed is your contract with the distribution ecosystem. You want to hold that contract yourself.

What does a well-formed feed actually look like at the top of the file? Like, what's the declaration?

You start with the XML declaration: version one point zero, encoding UTF-8. Then your RSS tag with the version two point zero attribute and the namespace declarations. You'll typically declare the iTunes namespace, which is xmlns colon itunes equals http colon slash slash www dot itunes dot apple dot com slash DTDs slash Podcast-1.dtd, and the content namespace for situations where you want to embed HTML in your feed. Then your channel element opens, and everything else lives inside it.

The items, the episodes, those are direct children of channel?

Direct children of channel, yes. Each item has its own title, link, description, pubDate, guid, and enclosure. The guid, the globally unique identifier, is important and often mishandled. It's supposed to be a permanent, unique identifier for that episode. A lot of people just use the episode URL, which works until you move your hosting and all your URLs change, at which point every podcast client thinks every episode is new and re-downloads the whole back catalogue.

Which is the kind of thing that gets you angry emails.

Extremely angry emails. The right approach is to use a UUID that you generate once when you create the episode and never change, regardless of what happens to your hosting infrastructure. The guid element has an isPermaLink attribute that should be set to false if you're using a non-URL identifier, which tells clients not to try to resolve it as a web address.

Generate a UUID, store it, never touch it again.

Never touch it again. It's one of those things where the correct behavior is the boring behavior, and people get into trouble by being clever.

What about validation? If you're building this by hand or with a script, how do you know your feed is actually valid before you push it?

There's a tool called the Feed Validator, the W3C's version has been around for a long time, and there are newer ones like Feedsmith that have a more comprehensive reference for what the spec actually requires versus recommends. The distinction between required, recommended, and optional elements matters. A feed missing a required element will either fail to be ingested by some directories or get ingested with broken metadata. Feedsmith's reference documentation is actually quite good for understanding which elements fall into which category.

Then there's the Podcast Index namespace, which is the more recent open extension to the spec.

Right, and this is worth mentioning because the Podcast Index project, which grew out of a push to keep podcasting open and not platform-controlled, has added namespace extensions for things the original iTunes spec didn't anticipate. Transcript support, chapter markers, value for value micropayments via Lightning Network, locked feeds that can't be imported without permission, person tags for crediting hosts and guests. These are all in the podcast colon namespace and they're increasingly supported by modern podcast clients.

The transcript one is interesting because that's something a lot of shows are doing now, and the question is whether you embed it or link to it.

The spec supports both. You can link to an external transcript file, which is probably the right approach for long-form shows because embedding a full transcript in the XML feed makes the feed very large and slow to parse. The element is podcast colon transcript with a url attribute pointing to your transcript file and a type attribute specifying the format, so text slash html or text slash plain or application slash json for the structured JSON format that some clients use for synchronized highlighting.

All of this is in a plain text file that any text editor can open.

That's the thing. The entire metadata layer of your podcast is human-readable, version-controllable, and deployable from a git repository. There's something elegant about that. Your show's identity, its history, its metadata, all of it lives in a file you can read with cat in a terminal.

Or with your eyes, if you're not showing off.

Or with your eyes, yes. And this connects directly to the deployment question, because if your feed is a file that lives in a git repository, the question becomes how do you serve it, how do you update it, and how do you make sure the audio files are accessible reliably at scale.

Which is where Vercel and Cloudflare R2 come in. And I want to be clear about what each of those is doing in Daniel's setup, because they're solving different problems.

Very different problems. Vercel is a serverless deployment platform, primarily aimed at web applications but perfectly capable of serving static files and handling serverless functions. When you push to your git repository, Vercel builds and deploys automatically. For a podcast, the feed XML might be generated by a serverless function that reads episode metadata from some data source and constructs the XML on request, or it might be a static file that you regenerate and commit whenever you publish a new episode.

The static file approach is simpler and has better performance characteristics.

Much better performance characteristics. A static XML file served from a CDN edge node is going to have latency in the single-digit milliseconds. A serverless function that runs on every request is going to have cold start times that can be anywhere from fifty milliseconds to several hundred milliseconds depending on the runtime and the platform's current load. For a podcast feed that might be polled by clients every hour, that latency difference matters.

Cloudflare R2 is where the actual audio lives.

R2 is Cloudflare's object storage product, and the key differentiator from something like Amazon S3 is the egress fee structure. S3 charges you for data transfer out of the bucket. For a podcast with any meaningful listenership, those egress costs can become substantial very quickly. A one-hour episode at a typical bitrate might be around sixty to ninety megabytes. If ten thousand people download that episode, you're looking at somewhere between six hundred gigabytes and nine hundred gigabytes of egress. At S3's standard pricing, that's real money. R2 has zero egress fees. You pay for storage and for API operations, but not for the data transfer itself.

Which for a podcast specifically is the cost that scales with success. The more listeners you have, the more you'd be paying on S3.

On R2 you're not. The storage cost scales, but slowly, and the API operation cost is minimal. It's a fundamentally more favorable pricing model for the use case. And you can put Cloudflare's CDN in front of R2 with essentially no configuration, because R2 is already inside Cloudflare's network. Your audio files get served from edge nodes close to your listeners, which means faster starts and fewer buffering events.

What does the actual setup look like? If someone wanted to replicate this from scratch?

The basic architecture is: you have a domain, let's say myweirdprompts.You have a Vercel project connected to a git repository that serves your website and your RSS feed. And you have an R2 bucket connected to a custom subdomain, something like audio.com, where your MP3 files live. When you publish a new episode, you upload the audio to R2, you update your episode metadata in whatever data store you're using, you regenerate the feed XML, and either commit it as a static file or let the serverless function pick up the new data automatically.

The enclosure URL in the feed points to the R2-backed subdomain.

The enclosure URL is audio.com slash episode-two-thousand-two-hundred-and-twenty-one dot mp3 or whatever your naming convention is. Podcast clients fetch that URL, Cloudflare serves it from the nearest edge node, R2 provides the bytes. The whole thing is stateless and scales horizontally without any configuration.

What about the argument that serverless is unsuitable for high-traffic podcasts? Because I've seen that claim.

It's mostly wrong, and it confuses two different things. The concern is usually about cold starts for serverless functions, which is a real issue for latency-sensitive applications. But for audio file delivery, you're not running a serverless function. You're serving a static file from object storage via a CDN. That's not serverless in the function-as-a-service sense, that's just a CDN, and CDNs handle massive traffic trivially. The RSS feed itself, if it's a static file, has the same characteristics. If you're generating the feed dynamically on every request with a serverless function, then yes, you need to think about caching headers and cold start behavior. But that's a design choice, not an inherent limitation of the platform.

The misconception is treating the entire stack as if it has the properties of its least performant component.

Which is a common error in infrastructure reasoning. The audio delivery and the feed serving have very different traffic patterns and very different performance requirements. Optimize them separately.

There's also a tooling question here. If you're building this yourself rather than using a hosted platform, you need scripts to manage the operational pieces. Uploading audio, updating metadata, regenerating the feed.

This is where the DIY approach requires some investment upfront that pays dividends over time. A basic deployment script for a podcast on this stack might do a few things: validate that the audio file is in the expected format and within expected size bounds, upload it to R2 using the Cloudflare API or the S3-compatible API that R2 exposes, generate a UUID for the episode guid, update the episode metadata in a JSON file or a database, regenerate the feed XML, and commit or deploy the updated feed. That whole pipeline can be a shell script or a short Python or Node script. It doesn't need to be complicated.

The R2 exposing an S3-compatible API is important because it means you can use any S3 client library without modification.

Every language has a mature S3 client. Python has boto3. Node has the AWS SDK. Go has its own SDK. Because R2 speaks the S3 API, all of those work with R2 with a trivial configuration change: you point the endpoint URL at your R2 account endpoint instead of an AWS regional endpoint, and everything else behaves identically. That's a meaningful reduction in the tooling investment required to switch.

There's also microfeed, which is worth mentioning. It's an open source project that essentially wraps this whole pattern into a lightweight CMS specifically for podcasts and other content, built on Cloudflare's infrastructure.

Right, I came across that in the context of people looking for self-hosted alternatives to podcast CMS platforms. It's not as bare-metal as what Daniel is describing, but it sits in an interesting middle ground. You get more structure than hand-rolling everything, but you're still running on your own Cloudflare account rather than paying a hosted platform. Worth knowing about if you want to move faster than building from scratch but don't want to hand over control.

For people who are comfortable with code, the from-scratch approach gives you more flexibility about how the feed is structured, what metadata you include, how you handle things like seasons or trailers or bonus content.

Which the spec actually has elements for. The itunes colon season and itunes colon episode tags let you organize content into seasons. The itunes colon episodeType tag distinguishes between full episodes, trailers, and bonus content. These are all things that hosted platforms handle for you through their UI, but when you control the feed directly, you're setting them explicitly, which means you understand exactly what you're telling directory platforms about your content.

This is the part where the independence of the setup creates a genuine gap that needs filling.

It's a gap that I think is worth being precise about, because the concern isn't just practical, it's ethical. The way a lot of podcast analytics work is through a redirect: the enclosure URL in your feed doesn't point directly to your audio file, it points to your analytics provider's server, which logs the request, then redirects to the actual audio file. That redirect gives you data: IP address, user agent, timestamp, referring client. And from that data you can infer geography, device type, podcast client, and approximate listener counts.

The IP address piece is the invasive part.

The IP address is personally identifying in many jurisdictions and is treated as personal data under privacy regulations like the GDPR. Storing it without explicit consent is legally problematic in Europe and ethically questionable everywhere. And the thing is, for most podcast creators, the IP address isn't actually what they want to know. They want to know: how many people listened? Where are they broadly? What clients are they using? You can get useful approximations of all of that without storing individual IP addresses.

What are the approaches?

The first is leaning on the platform dashboards that the distribution platforms themselves provide. Apple Podcasts Connect gives you download counts, device breakdowns, and geographic data aggregated at the country level. Spotify for Podcasters gives you streams, listeners, follower counts, and demographic data that Spotify infers from its user accounts. Amazon Music has its own dashboard. These are non-invasive by design because the platforms are using their own first-party data about their users, not tracking your listeners independently.

The limitation is that you only see the listeners who found you through those specific platforms.

Right, if someone is using a client like Pocket Casts or Overcast or a self-hosted client, they're not going through Spotify's servers, so Spotify can't tell you about them. For a show with significant listenership through independent clients, the platform dashboards will systematically undercount. But for most shows, the majority of listening happens through the major platforms, so the undercounting is bounded.

Then there's the approach of doing your own lightweight logging without storing identifying information.

This is technically interesting. If you're serving the audio through Cloudflare, Cloudflare Workers can intercept requests to your audio files and log aggregate data without ever persisting the IP address. You get a timestamp, the user agent string, the country code that Cloudflare infers from the IP at the network level but doesn't expose to your worker in a way that you need to log, and the bytes transferred. From that you can build a picture of: how many requests did this episode get, from what countries, from what podcast clients, over what time period. All without a single IP address touching your database.

The user agent string is interesting because podcast clients have fairly distinctive user agents.

Overcast announces itself as Overcast. Pocket Casts announces itself as Pocket Casts. Apple's Podcasts app has a characteristic user agent. You can parse the user agent to get client distribution data that's actually useful for understanding your audience's technical preferences. And none of that requires knowing who any individual listener is.

There's also Podscan.fm, which takes a different approach entirely. It aggregates publicly available data, chart rankings, review counts, that sort of thing, without any tracking of individual listeners.

Right, Podscan is more about competitive intelligence and discoverability than about your own listener analytics. It can tell you things like where your show ranks in various categories, how that's changed over time, what your review sentiment looks like. That's useful for different questions than "how many people listened to episode two thousand two hundred and twenty-one." But it's worth knowing about as part of the analytics picture, because it gives you signal from the public-facing layer of the podcast ecosystem without requiring any instrumentation on your end.

For people who want more granular data and are willing to accept some tradeoffs, there's the approach of hashing the IP address before logging it.

This is a common technique in privacy-preserving analytics. You take the IP address, combine it with a daily salt, hash the result with something like SHA-256, and store the hash. You can use the hashes to deduplicate requests within a day, so you're not counting someone who presses play, pauses, and resumes as three separate downloads. But you can't reverse the hash to get the original IP, and because the salt rotates daily, you can't track a listener across days. It's a reasonable middle ground for people who need more accuracy than raw request counts but don't want to build a surveillance apparatus.

The IAB, the Interactive Advertising Bureau, has a certification standard for podcast measurement that addresses some of this. It defines what counts as a valid download and how deduplication should work.

The IAB Podcast Measurement Technical Guidelines, yes. The current version defines a download as a request that transfers at least a certain minimum number of bytes, I believe it's one byte, but the meaningful threshold is typically a few minutes of audio to distinguish genuine listens from feed polling by clients that prefetch a few seconds to check for new content. The standard also specifies deduplication windows and how to handle bots. If you're building your own analytics, the IAB guidelines are worth reading because they define the conventions that the industry uses to make download counts comparable across platforms.

Someone building this from scratch has a few options: platform dashboards for the listeners who come through major platforms, lightweight Cloudflare Worker logging with IP hashing for the direct traffic, and tools like Podscan for the public-facing layer. None of it requires a redirect chain through a third-party analytics service.

The combination is actually pretty good coverage for most shows. You're not going to have the granular per-listener data that a platform like Spotify has because Spotify knows who its users are. But you're going to have enough to understand whether your show is growing, which episodes are resonating, and where your audience broadly is. For a show that's not monetizing through dynamically inserted ads, which is the use case where you need IAB-certified numbers, that's probably sufficient.

The ad insertion case is interesting because that's where the redirect approach becomes almost unavoidable if you want to work with advertising networks.

It is, and that's a real constraint. If you want to use a programmatic ad insertion network, they need to be in the request chain to serve the right ad to the right listener at the right time. That requires the redirect. But for a show that's not doing dynamic ad insertion, and plenty of successful shows aren't, you can stay out of that redirect chain entirely and keep the direct relationship between your server and your listener's client.

Which is also better for reliability. Every hop in the chain is a potential point of failure. If your analytics provider has an outage, and your audio is behind their redirect, your listeners can't download your episode.

This has happened. More than once. There have been incidents where analytics providers had outages that effectively took down the audio delivery for shows that had their enclosure URLs pointing through the analytics redirect. When you're serving audio directly from R2 via Cloudflare, the only things that can take you down are Cloudflare itself or R2 itself, both of which have extremely high availability SLAs and distributed infrastructure.

Cloudflare's network spans something like three hundred cities at this point. It's not going down because of a localized infrastructure event.

R2 is built on the same infrastructure. So the reliability argument for the direct approach is actually quite strong. You're trading some analytics granularity for meaningfully better uptime characteristics, and for most shows that's the right trade.

Let's talk about what someone should actually do if they want to build this. What are the concrete steps?

Starting from zero, the first decision is your feed generation approach. Are you writing the XML by hand and committing it to git, generating it with a script on every publish, or using something like microfeed to manage the structure? For a solo creator comfortable with code, the script approach is probably the right one. You write a script that takes your episode metadata as input and produces a valid RSS XML file as output. You run it every time you publish. The script is simple enough that you can read and understand it entirely, which means you can debug it when something goes wrong.

You're not dependent on a third party's interpretation of the spec.

The second decision is your storage structure in R2. I'd recommend a simple flat structure: all your audio files in a single bucket, named consistently, something like episode-GUID.The GUID is the same UUID you put in the feed, which gives you a permanent, stable URL that survives any metadata changes. You configure a custom domain on the bucket through Cloudflare's DNS, you set appropriate cache headers on the audio files, probably cache for a long time since audio files don't change after they're published, and you're done.

What cache duration makes sense for audio?

For audio files, very long. A year is not unreasonable. The file is what it is, it's not going to change. The RSS feed is different: you want clients to check for updates reasonably often, so a shorter cache time, maybe an hour or a few hours. The feed has a ttl element, time to live, that you can set in minutes to hint to clients how often they should check for updates. Sixty is a common value. Some clients respect it, some don't, but it's worth setting.

Then for Vercel, the feed XML is either a static file you push with each episode, or a serverless function.

For simplicity, I'd start with the static file approach. Your deployment workflow is: record episode, edit, export MP3, upload to R2, run your feed generation script, commit the updated feed XML, push to git, Vercel deploys automatically. The whole thing is five commands after the audio is ready. As you add complexity, maybe you want to add show notes in a structured way, or you want to support transcript linking, or you want to add chapter markers, your script grows with your needs but the fundamental pattern stays the same.

The validation step in your script catches malformed XML before it goes out.

Yes, and this is worth being explicit about: always validate your feed before deploying. The W3C validator or a tool like Feedsmith's validator will catch things like missing required elements, malformed dates, invalid MIME types in enclosure tags. A malformed feed can cause directories to stop ingesting your updates, and the failure is often silent: the directory just stops showing new episodes, and you don't find out until a listener emails you asking why there are no new episodes.

Which is the kind of thing that is very embarrassing to explain.

The date format in pubDate and lastBuildDate is a common source of errors. It has to be in RFC two eight two two format, which looks like Mon comma space fourteen Apr space twenty twenty-six space twelve colon zero zero colon zero zero space plus zero zero zero zero. It's a specific format, it's not ISO eight six zero one, and parsers are sometimes strict about it. Use a library to generate it rather than formatting it by hand.

What about the iTunes category hierarchy? That's another place where people get it wrong.

The category structure has two levels: a top-level category and optional subcategories. The element is itunes colon category with a text attribute. If you want a subcategory, you nest another itunes colon category element inside the first one. The valid category values are defined by Apple's directory, and using an invalid category string means your show won't appear in that category in Apple Podcasts. Worth looking up the current list and making sure you're using exact strings.

Spotify has its own category system that's somewhat different.

Spotify maps from the iTunes categories in most cases, but there are some differences. For most shows, getting the iTunes categories right is sufficient and Spotify will figure it out. If you want to be precise about Spotify categorization, you can look at what Spotify shows for shows in your space and reverse-engineer what feed categories map to what Spotify categories. It's not documented as cleanly as the iTunes spec.

One thing we haven't touched on: what about shows with video? Is the same XML structure used?

The same RSS structure can carry video enclosures, yes. The MIME type in the enclosure tag changes to video slash mp4 or video slash webm or whatever format you're using. YouTube has its own distribution system that operates separately from RSS for most creators, but if you're distributing video podcasts through podcast clients rather than YouTube, the RSS enclosure approach works. The podcast namespace has elements for alternate audio-only versions of video episodes, which is useful if you want to offer both formats from a single feed.

This whole conversation is a good argument for the spec being more capable than people realize.

It really is. People think of RSS as this legacy format that's barely holding on, and the reality is that the combination of RSS two point zero, the iTunes namespace, and the Podcast Index extensions gives you a remarkably complete metadata and distribution system. It handles transcripts, chapters, person credits, funding links, micropayments, content ratings, season structure, episode types, trailer support, and alternate enclosures. That's not a legacy format. That's a mature, extensible standard that's been carefully evolved without breaking backward compatibility.

Which is more than you can say for most things on the internet.

The web platform has broken backward compatibility more times than I can count. RSS two point zero feeds from two thousand five still parse correctly in current clients. That's a remarkable achievement in specification stability.

Alright, so the picture we're painting is: own your feed, serve your audio from R2, deploy your web layer on Vercel, instrument your analytics carefully and minimally, and understand the spec well enough that you're not dependent on a tool to do things you could do yourself. Is there anything in that picture that you'd push back on?

The one thing I'd add nuance to is the "do it yourself" framing. There's a real cost to building and maintaining this infrastructure, even though the operational cost is low. If you're spending time debugging your feed generation script, that's time you're not spending making the show. The right answer depends on how technical you are and how much the independence matters to you. For Daniel, who is clearly comfortable at the infrastructure level and cares about the ownership question, this setup makes complete sense. For someone who just wants to make a show and doesn't care about the underlying stack, a hosted platform is probably the right call. The important thing is making the choice consciously rather than defaulting to a hosted platform without understanding what you're giving up.

Which is exactly the kind of thing this episode is designed to let people make an informed decision about.

That's the goal. You should know what the feed looks like, you should know what the hosting options are, and you should know what the analytics tradeoffs are. Then you can decide where on the spectrum from

...fully managed to fully owned you actually want to sit.

The history of that spectrum is interesting though. Podcasting started out almost entirely DIY. Dave Winer and Adam Curry in two thousand three, RSS two point zero, enclosure tags, the whole thing was built by people who wanted to distribute audio without asking permission from a gatekeeper. The infrastructure was simple because it had to be. You had a web server, you had an RSS file, you were done.

That simplicity is what made it resilient. Nobody could take podcasting away from you because there was nothing to take. The spec was published, it was open, anyone could implement it. The directories, Apple Podcasts, Spotify, all of that came later and layered on top of a foundation that didn't require them.

Which is why owning the feed still matters in the same way it mattered in two thousand three. The underlying logic hasn't changed.

Serverless changes the cost calculation significantly though. In two thousand three, running your own infrastructure meant a physical server or a VPS you were paying for and maintaining regardless of whether anyone was listening. The fixed cost was real. With R2 and Vercel, you're paying for actual consumption. R2 charges per gigabyte stored and per gigabyte served, with zero egress fees because Cloudflare made that call deliberately to undercut the AWS model. For a small to mid-sized show, the monthly bill is often just a few dollars.

Which removes the main practical argument for handing your hosting over to a platform.

The argument used to be that running your own infrastructure is expensive and complicated. The expense part is largely gone. The complexity is real but manageable, which is what this whole conversation is about—especially when it comes to understanding the tools involved, like RSS feeds.

Right, and the spec itself is where we should probably spend some time, because I think people who haven't looked at a raw RSS feed assume it's more complicated than it is. And people who have looked at one assume it's simpler than it is.

Both are true simultaneously, which is one of my favorite things about it. The basic structure is minimal. You have an XML document, you have an RSS root element with a version attribute set to two point zero, inside that you have a channel element, and the channel contains your show-level metadata and then a series of item elements, one per episode. That's the whole shape.

Walk through the channel-level fields that actually matter.

At the channel level, the required elements from the RSS two point zero spec are title, link, and description. Title is your show name. Link is your show's website URL. Description is a human-readable summary of the show. Those three get you a technically valid RSS feed. But a podcast feed also needs the itunes colon image element for artwork, the itunes colon author element, the itunes colon category element we mentioned earlier, and the language element, which is a two-letter code like en for English. And critically, you need the itunes colon explicit element, which is either true or false, because Apple Podcasts will not list your show without it.

That last one catches people who don't read the Apple spec carefully.

And the explicit element is interesting because it's an iTunes namespace extension, not part of core RSS two point zero, but in practice it's required for distribution. That's the pattern throughout podcast feeds: the RSS spec gives you the skeleton, and the iTunes namespace extensions give you the flesh. The Podcast Index namespace, which is the more recent open-source effort to extend the spec, adds things like transcripts, chapters, and funding links on top of that.

What does an item element look like? Because that's where each episode lives.

An item at minimum needs title, description, and an enclosure element. The enclosure is what makes it a podcast episode rather than a blog post. It has three attributes: url, which is the direct link to your audio file; length, which is the file size in bytes; and type, which is the MIME type, almost always audio slash mpeg for MP3 files. Then you have guid, which is a globally unique identifier for the episode, and pubDate, which is the publication date in that specific RFC two eight two two format we talked about. Then the iTunes extensions add itunes colon duration, itunes colon episode, itunes colon season, itunes colon episodeType, and itunes colon image if you want a per-episode image rather than just the show image.

The GUID is worth dwelling on for a second. Because people treat it as an afterthought and it's actually load-bearing.

Very load-bearing. The GUID is how podcast clients and directories track episodes. If you change your GUID, the client sees it as a brand new episode. If you delete and recreate an episode with the same GUID, clients that already downloaded it won't re-download it. If you migrate your feed to a new host and your GUIDs change, every episode looks new to every subscriber. That is a catastrophic migration failure. The convention is to use a UUID, generate it once, store it, never change it. Some people use the episode URL as the GUID, which works as long as your URL structure never changes, but using a proper UUID decouples your episode identity from your URL structure, which is cleaner.

What about the isPermaLink attribute on the GUID element?

By default, RSS parsers assume the GUID is a URL they can visit. If your GUID is a UUID rather than a URL, you need to set isPermaLink equals false on the element, otherwise parsers may try to fetch it as a URL and log errors. It's a small thing but it keeps your feed clean.

How do you generate and validate one of these feeds without doing it entirely by hand?

A few approaches. The most robust for a custom setup like Daniel's is a script, probably Python or JavaScript, that reads episode metadata from a structured source, maybe a JSON file or a directory of markdown files with front matter, and outputs the XML. The Python standard library has xml.ElementTree for building XML programmatically, which is fine, though a lot of people reach for the feedgen library, which gives you a podcast-aware abstraction so you're not manually constructing every element. In the JavaScript world, there's a package called podcast that does similar work. The key is that you're generating the XML from data, not editing XML by hand, because hand-editing XML is how you introduce subtle formatting errors that validators catch and humans miss.

Validation happens before you push?

Ideally it's part of your deployment script. You generate the feed, you run it through a validator, and if the validator returns errors, the deploy fails. The W3C Feed Validation Service has a public API you can hit programmatically. Feedsmith has a reference implementation and validator. Apple has a podcast feed validator at podcastsconnect.com that's worth running manually any time you make structural changes to your feed, because Apple's parser has its own opinions about what it accepts beyond what the spec technically requires.

What does Apple reject that the spec technically allows?

A few things. Apple is strict about the artwork dimensions: three thousand by three thousand pixels maximum, five hundred by five hundred minimum, JPEG or PNG only. They're strict about the itunes colon type attribute on the enclosure element matching the actual file format. They have opinions about description length. And they will reject feeds where the audio files aren't accessible over HTTPS. That last one is not unique to Apple but it's worth stating explicitly: HTTP-only audio URLs will cause problems across multiple directories now.

HTTPS everywhere, even for audio files that have been sitting on a server since two thousand twelve.

If you're migrating an old show to a new host, make sure your audio URLs are updated to HTTPS. The redirect from HTTP to HTTPS doesn't always work cleanly for enclosure URLs because some clients don't follow redirects on enclosures.

What's the Podcast Index namespace adding that the iTunes namespace doesn't cover?

The big ones are transcripts, chapters, and the podcast colon person element. Transcripts let you link to a transcript file directly from the feed, with a type attribute specifying the format, so SRT, WebVTT, or plain text. Chapters let you link to a JSON chapter file that clients can use to render a chapter navigation interface. The person element lets you credit contributors with roles and links, so you can say this episode features this person as a guest with a link to their website. There's also podcast colon soundbite for short clips, podcast colon value for micropayment information in the Lightning network sense, and podcast colon locked which is a flag that tells directories you don't want your feed imported to a competing hosting platform without your permission.

That last one is interesting. How does locked work in practice?

You set the element to yes and include your contact email. If a directory tries to import your feed, they're supposed to check for the locked element and not proceed if it's set to yes. Compliance is voluntary, it's not technically enforceable, but the major directories that have adopted the Podcast Index namespace do respect it. It's more of a statement of intent than a technical lock.

Which is fine. Most of podcast distribution runs on good-faith adherence to conventions rather than cryptographic enforcement.

And it's worked for twenty-plus years. The directories have incentive to respect the spec because their value comes from having comprehensive, accurate listings. A directory that ignores feed metadata or imports locked feeds would lose the trust of podcasters and ultimately lose content.

The whole system is a coordination problem that solved itself through aligned incentives.

A well-designed spec that made the right tradeoffs. RSS two point zero is not a perfect format. The date handling is annoying, the namespace situation is messy, there are ambiguities in the spec that different parsers resolve differently. But it was good enough, it was open, and it shipped. That combination beats perfect-but-proprietary every time.

That’s the thing about specs like RSS—they’re only part of the story. Where Daniel actually runs all of this, the infrastructure side, is where a lot of DIY podcasters hit a wall.

The wall used to be real. If you wanted to self-host audio files five years ago, the calculus was painful. You're paying for a VPS, you're managing storage, and the thing that kills you is egress. Audio files are large. Every download is a transfer cost. A show with ten thousand downloads per episode at fifty megabytes per file is five hundred gigabytes of outbound transfer. On a traditional VPS or even something like S3, that egress bill adds up fast.

R2 eliminates the egress cost entirely.

Zero egress fees, which is the headline feature. Cloudflare R2 is object storage, same API surface as S3, so any tooling that talks to S3 will talk to R2 without modification. You store your audio files there, you serve them from there, and the transfer cost is zero regardless of how many times those files get downloaded. The storage cost itself is low, around one and a half cents per gigabyte per month. For a podcast back catalogue of a few hundred episodes, you're talking a few dollars a month in storage.

Vercel sits in front of that doing what exactly?

Vercel handles the dynamic layer. Your RSS feed needs to be generated and served, you might have a show website, you might have episode pages. Vercel runs that as serverless functions, so you're not paying for a server that's idle ninety-nine percent of the time. A request comes in, the function spins up, generates your feed XML from your episode data, returns it. The audio files themselves are served directly from R2, not proxied through Vercel, because routing large binary files through a serverless function would be slow and expensive.

The architecture is: episode metadata lives somewhere, a Vercel function reads it and generates the feed, and the audio URLs in that feed point directly to R2.

The metadata could be a JSON file in a git repository, which is what a lot of people do. You push a new episode, the deploy hook fires, Vercel rebuilds, your feed is updated within seconds. The audio file you uploaded to R2 separately. It's two operations: upload audio to R2, update your episode metadata in the repository. That's the whole publish workflow.

Compare that to a traditional VPS setup. Because I think the instinct for technically-minded people is that a VPS gives you more control.

You have more control in the sense that you can install arbitrary software. But for a podcast specifically, you don't need arbitrary software. You need file storage and a feed generator. The VPS gives you things you don't need and makes you responsible for things that are annoying: OS updates, SSL certificate renewal, nginx configuration, disk space monitoring. The serverless setup outsources all of that. You never think about the server because there isn't one.

The failure modes are also different. A VPS goes down and your feed is unavailable. R2 and Vercel have Cloudflare's and Vercel's reliability guarantees behind them, which are substantially better than what you'd achieve running your own box.

The scaling argument, which used to be the one place VPS defenders had a point, is also gone. The misconception is that serverless can't handle a traffic spike. In reality it handles spikes better than a fixed-size VPS, because a VPS has a ceiling determined by the machine you're paying for, and when you exceed it, things fall over. Serverless scales horizontally by design. If your episode goes viral and you get a hundred thousand downloads in an hour, R2 serves the files, Vercel serves the feed, and neither of them breaks a sweat.

The place where the setup has a gap is analytics. Because you've offloaded the server, you've also offloaded the access logs.

That's the honest tradeoff. A traditional server gives you access logs as a byproduct. Every request hits your nginx instance, nginx writes a log line, you parse the log lines and you have download counts, rough geography, client applications. When audio is served from R2 directly, Cloudflare sees the requests, but surfacing that data in a usable form requires either Cloudflare's paid analytics products or a workaround.

Daniel is specifically opposed to invasive tracking, which rules out a bunch of the standard approaches.

The invasive approaches are things like embedding a tracking pixel in your show notes, using a redirect service that logs requests before forwarding to the audio file, or dropping JavaScript analytics on your website that fingerprints listeners. Those approaches work, but they're collecting data about individuals, often without clear disclosure, and they're exactly the kind of thing that erodes listener trust.

What are the non-invasive options?

The first is platform dashboards. Apple Podcasts Connect and Spotify for Podcasters both give you download and listener metrics for their own platforms, and those are aggregated, non-invasive from your side because you're just reading what the platform already collected. You're not doing the tracking, they are, and they have their own privacy policies. That's not ideal but it's clean on your end.

It also means you only see the slice of your audience that uses those platforms.

Right, it's partial by definition. The second option is a redirect proxy for your audio URLs. Instead of pointing your feed directly at R2, you point it at a thin redirect service you control, the redirect service logs the request at the IP and user-agent level, then forwards to R2. You're only logging what a server would have logged anyway, no cookies, no fingerprinting, no persistent identifiers. The podcast stats specification from the IAB, the Interactive Advertising Bureau, defines how to count unique downloads from this kind of log data, deduplicated by IP and user-agent within a twenty-four hour window. That's the industry standard for what counts as a download.

Is there a lightweight tool for running that redirect layer without standing up another server?

fm is worth mentioning here. It aggregates public data, chart rankings, review counts, listing data across directories, without doing any invasive tracking on your end. It's not a replacement for download counts, but it gives you reach and discoverability signals. For the redirect layer specifically, there are open source options you can deploy as a Cloudflare Worker, which keeps you in the same infrastructure stack. A Worker runs at the edge, logs the request metadata to Cloudflare's analytics engine, and redirects to R2. You're adding one hop but it's a Cloudflare-to-Cloudflare hop so latency is negligible.

The full privacy-respecting analytics stack is: platform dashboards for Spotify and Apple, a Cloudflare Worker redirect layer for aggregate download counts, and something like Podscan for external signals.

That gets you meaningful data without compromising anyone. You know roughly how many people downloaded each episode, which platforms they're using, rough geographic distribution from IP geolocation if you want it, and how your show ranks publicly. What you don't know is individual listener behavior across episodes, which is the invasive part anyway. And honestly, for most independent podcasters, knowing that an episode got four thousand downloads in the first week is sufficient. You don't need a session replay of how long each person listened before dropping off.

The shows that need that granular retention data are the ones with advertising deals where the advertiser wants proof of completion rates. If you're not in that business, you don't need that data.

If you are in that business, you've probably moved to a hosting platform that handles all of this for you and takes a cut. That's a different product for a different use case. The DIY serverless setup is optimized for ownership, cost, and control, not for advertising analytics dashboards.

Which is exactly the use case Daniel is describing. Distribute to Spotify without depending on Spotify. Host audio without depending on a podcast host. Keep the data you collect proportionate to what you actually need.

The microfeed project is worth mentioning in that context. It's an open source, lightweight CMS built on Cloudflare infrastructure specifically for this kind of self-hosted podcast setup. Same stack, different packaging. If someone wants the R2 and Cloudflare Worker approach but doesn't want to wire it together themselves from scratch, microfeed gives you a starting point to fork and adapt.

The ecosystem around this approach is small but it exists, which means you're not solving entirely novel problems when you go this route.

The patterns are established. The tooling is available. The costs are low. The main thing you're trading away is the hand-holding that a managed hosting platform provides. And for someone who's comfortable with a git repository and a deploy hook, that's not much of a trade. But if you're ready to dive in, the next question is: how do you actually put it all together?

What does the build order look like? Because there are a few moving parts, and the sequence matters.

Start with the feed structure before you touch any infrastructure. Get your XML right on your local machine first. You need a valid RSS 2.0 document with the iTunes namespace declared, a channel block with title, link, description, language, and at least one item with a properly formed enclosure. Run it through the Podcast Index validator and Apple's feed validator before you deploy anything. Catching a malformed GUID or a missing explicit tag at that stage costs you nothing. Catching it after you've submitted to Apple and gotten rejected costs you days.

The feed is your source of truth and you validate it in isolation.

Then you set up R2. Create a bucket, configure it for public read access on your audio paths, upload a test file, confirm you can hit the URL. That step takes maybe twenty minutes and it tells you whether your bucket permissions are correct before anything else depends on them. Then build the Vercel function that reads your episode metadata and generates the feed XML. Point the enclosure URLs at R2. Subscribe to your own feed in a podcast app and confirm the episode plays.

The Cloudflare Worker redirect layer comes after that, once the basic plumbing is working.

Don't add the analytics redirect until the core publish workflow is solid. When you do add it, update the enclosure URLs in your feed to point at the Worker endpoint rather than R2 directly. The Worker receives the request, logs the relevant fields to Cloudflare's analytics engine, and issues a three-oh-two redirect to the R2 URL. Keep the Worker logic minimal. The more logic you add there, the more there is to break on every download.

For the metadata store, the git repository approach is the simplest option if you're already comfortable with that workflow.

A JSON file with an array of episode objects. Each object has title, description, publication date, GUID, duration, file size, and the R2 path for the audio. Your Vercel function reads that file and renders the XML. The whole feed generator can be under a hundred lines of code. You don't need a database, you don't need a CMS, you don't need anything that can go wrong in the middle of the night.

The GUID discipline is worth repeating here because it's the thing people get wrong once and then regret.

Generate a UUID for every episode before you publish it. Write it into your metadata file. Never change it. That identifier is how every podcast app in the world tracks whether it has seen that episode before. If you change it, apps re-download the episode and mark it as new. Your listeners get duplicate notifications. Some apps will re-download your entire back catalogue. It's a bad day for everyone.

On the analytics side, the practical starting point is just turning on Apple Podcasts Connect and Spotify for Podcasters dashboards. You probably have those accounts already if you're distributing to those platforms.

Read them with appropriate skepticism. Apple's numbers represent Apple's measurement methodology, Spotify's represent Spotify's. They don't agree with each other and neither of them counts the listeners who use a third-party app to subscribe to your RSS feed directly. But they're real signal. If an episode significantly outperforms your average on both platforms simultaneously, something happened with that episode. That's useful to know.

The direct RSS subscribers are arguably your most engaged listeners and they're also the ones you have the least visibility into without the redirect layer.

Which is why the Worker approach is worth building eventually. Even if you don't look at the analytics every week, having the data accumulate means you can answer questions later. How did downloads trend over the first year? Which episodes had unusual spikes? You can't reconstruct that retroactively if you never collected it.

Collect proportionate data consistently, don't try to instrument everything on day one.

That's the mindset that scales. You're not building a surveillance apparatus, you're building a record you can reason from later.

Where does this go from here, technically? Because the stack we've described today is already pretty capable, but podcasting infrastructure keeps moving.

The thing I'm watching is the Podcast Index namespace gaining more traction. The value tag, which lets listeners send micropayments via the Lightning Network directly to a podcast's wallet, is live and working in apps like Fountain and Breez. That's a monetization model that requires zero advertising infrastructure and zero intermediary. If you own your feed and you've implemented the value block correctly, you get paid directly. The analytics question becomes less urgent when your revenue signal is cryptographically verified payment events rather than estimated download counts.

Decentralized distribution, decentralized payment, no platform taking a cut. That's a coherent vision.

The chapters element is also underused. Apps that support it, Overcast, Pocket Casts, Podcast Addict, render chapter markers with titles and images inside the player. For a show like this one, where we move through distinct segments, chapters are useful for listeners. And they're just another element in the feed. No third-party tool required.

The barrier to implementing any of this is lower than people assume. That's probably the thing worth leaving listeners with. If you've been thinking about running your own podcast infrastructure and assuming it requires serious devops experience, the actual work is closer to an afternoon with a text editor and a Cloudflare account.

The XML spec is twenty years old and stable. R2 is cheap and reliable. Vercel's free tier handles the feed generation for a show of almost any size. The hard part is deciding you want to own it, not the technical execution.

Thanks to Hilbert Flumingtop for producing this one. And Modal, our serverless GPU sponsor, keeps the pipeline running so we can keep the episodes coming.

If you've got thoughts on your own podcast setup, or questions about anything we covered today, find us at myweirdprompts.And if the show has been useful to you, a review on Spotify goes a long way.

This has been My Weird Prompts. We'll see you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2301: Inside Podcasting's Simple, Powerful Infrastructure

Mentions

Downloads

You Might Also Like

Featured In

#2301: Inside Podcasting's Simple, Powerful Infrastructure