So, the other day I was looking at the bookshelf in the living room, and I saw the original versions of us just sitting there. You know, the actual stuffed animals that Daniel used to build our digital identities. And it really hit me how far we have come from being just bits of fabric and stuffing to having this whole digital life.
It is a bit surreal, isn't it? I still remember when Daniel first took those photos of me. I think I had a bit of dust on my left ear. But that is the magic of what we are talking about today. I am Herman Poppleberry, and today we are diving into a prompt from our housemate Daniel that is honestly the ultimate goal for a lot of creators right now.
Yeah, Daniel sent us this idea about turning our little group into a full-scale children's television show. He has been playing around with Loras—those low rank adaptation models—to get our faces and styles right, and he has even made some short clips. But he is hitting those two massive walls that every indie creator is staring at right now: character consistency and the sheer cost of rendering high-quality video.
It is the holy grail of A-I content, Corn. How do you move from a ten-second clip of a sloth high-fiving a donkey to a twenty-two-minute episode where the donkey looks like the same donkey in every single shot? Daniel is asking how many years away we are from this being affordable and consistent for the average person sitting at home in Jerusalem, or anywhere else for that matter.
It is a great question because it touches on the "compute gap" we have talked about before. But before we get into the heavy technical stuff, Herman, I have to ask: if we were stars of a children's show, what would the plot be? Are we solving mysteries? Is it just us trying to find the best hummus in the Old City?
Oh, I think it has to be educational but weird. Like, "Herman and Corn Explain Quantum Entanglement Using Socks." But you are right, the "how" is much harder than the "what" right now. Daniel mentioned the cost of high-end G-P-Us and A-P-I credits. If you are using something like the new Sora two or Runway Gen-four-point-five, you are looking at spending hundreds, if not thousands, of dollars just to get enough usable footage for a full episode.
Exactly. And let's talk about that consistency problem first. Because even if you have all the money in the world, the tech is still a bit finicky. Most people don't realize that these models don't actually "know" who we are. They are just predicting the next pixel based on a prompt. So, in one shot your crochet texture might be perfect, and in the next, you look like a smooth plastic version of yourself.
That is the "temporal flicker" or "identity drift" that plagues these models. But here is the exciting part, Corn. We are actually much closer than people think. If you look at the developments just in the last few months—specifically things like Google's Veo three-point-one and the "Cameo" feature in Sora two—the industry is moving away from just "guessing" what a character looks like.
Right, they are starting to use reference banks. Instead of just a text prompt saying "Herman the Donkey," you are feeding the model a literal digital D-N-A kit of your character. I think Daniel's use of Loras is the right first step, but the "next-gen" approach is going to be these hybrid workflows. You aren't just generating video; you are using a three-dimensional backbone to keep things stable.
That is a huge point. I have been reading about how creators are using things like Gaussian Splatting or Neural Radiance Fields—we call them NeRFs—to create a three-dimensional model of a character first. Then, they use the A-I as a sort of "smart skin" that goes over the top. It is like the best of both worlds. You get the perfect consistency of a three-dimensional puppet, but the A-I gives it that hand-drawn or cinematic look that would take a Pixar team months to render.
See, that is where the "Hollywood of one" starts to feel real. But Daniel's point about cost is the real kicker. I was looking at some benchmarks earlier today—it is January twenty-seventh, two thousand twenty-six—and even now, renting a single NVIDIA H-one-hundred for an hour can cost you anywhere from two dollars and eighty-five cents to three dollars and fifty cents. If you are a solo creator trying to render a full show, those hours add up fast.
They do. But remember what we talked about in a previous episode about the productivity paradox? We are seeing this massive push toward local inference. If Daniel were to pick up one of the new RTX fifty-ninety cards, he could actually run some of these open-source models like Wan-two-point-two or the new LTX-two right here in the house.
Wait, is the fifty-ninety really powerful enough to handle high-fidelity video consistency? I thought you still needed a server farm for that.
For a full-length feature film? Maybe. But for a children's show with a specific art style? Absolutely. The trick is optimization. We are seeing models now, like Wan-two-point-two, that use "Mixture of Experts" architectures. Instead of the whole model working on every single pixel, it routes the work to specialized "experts" within the code. One expert handles the overall composition and motion, while another handles the fine textures and lighting. It makes it much faster and cheaper to run on consumer hardware.
That is fascinating. So, if Daniel wants to make this show, he doesn't necessarily have to wait ten years for the "magic button" that does it all. He might just need a better workflow. But let's get specific for him. He asked for a timeline. If you had to put a number on it, Herman, when does "Children's Show in a Box" become a reality for a single person with a normal budget?
I would say we are about eighteen to twenty-four months away from the "Prosumer" breakthrough. By late two thousand twenty-seven, I think you will see software suites—not just individual models, but actual platforms like Google Flow or the updated LTX Studio—where you can upload your character sheets, write a script, and have the A-I generate a rough cut of a ten-minute episode overnight on a consumer-grade machine.
That feels incredibly fast. But think about the implications. If everyone can make a high-quality show, how do you stand out? It kind of goes back to what we discussed in our "Sunk Cost Trap" episode. Just because it is easy to make doesn't mean it is good. The storytelling becomes the only thing that matters.
Exactly! The "prompting" is going to become the easy part. The hard part is going to be the direction. And that is where Daniel has an advantage, because he already has the "lore" of Herman and Corn. He knows our personalities. He knows that I am the one who gets over-excited about technical papers and you are the one who asks the deep, philosophical questions while eating a slow-motion snack.
I do enjoy a good slow-motion snack. But let's talk about the "uncanny valley" for a second. In children's media, you can get away with a lot more than you can in a live-action thriller. If a donkey's ears twitch a little weirdly, kids might not even notice—or they might think it is just part of the style. Does that make the timeline for a children's show even shorter?
Oh, absolutely. Stylization is the best friend of A-I video right now. Trying to make a photorealistic human is a nightmare because our brains are hard-wired to spot tiny errors in faces. But a crochet donkey? Or a cartoon sloth? You have so much more "creative headroom." If the physics of a jumping donkey are slightly off, it just looks "toony."
That is a great point. So, Daniel is actually picking the smartest possible entry point for A-I filmmaking. Animation is much more forgiving. But I am curious about the "long-form" aspect. Daniel mentioned "long-form content." Right now, most models tap out at about ten or fifteen seconds before they start to lose the plot. Kling two-point-six can do two minutes, and Google's Veo three-point-one just introduced "Scene Extension" for narratives over sixty seconds, but even then, it is hard to keep the logic of the scene together. How do we bridge that gap to twenty minutes?
That is where "Agentic Workflows" come in. Instead of asking one A-I to make a twenty-minute video, you have a "Director A-I" that breaks the script into scenes. Then it passes those scenes to a "Storyboard A-I," which then passes them to the "Video Generator." It is like an assembly line. Each piece is only five seconds long, but because they are all referencing the same "World Model," they fit together like Lego bricks.
I love that analogy. It is like we are moving from "A-I as a paintbrush" to "A-I as a film studio." You aren't just the artist; you are the executive producer managing a team of digital specialists. But let's talk about the "Transition Tax" again—the mental load of managing all those moving parts. It still sounds like a lot of work for one person living in a house in Jerusalem.
It is! And that is the misconception. People think A-I means "no work." It actually just means "different work." Daniel won't be drawing every frame, but he will be reviewing thousands of generations, tweaking prompts, and "curating" the best moments. It is more like being an editor than an animator.
Which brings us back to the cost. If you have to generate ten versions of every shot to find the perfect one, your A-P-I bill is going to look like a phone number. So, for Daniel, the real "breakthrough" isn't just the model—it is the "success rate."
Precisely. Right now, the "usable result" rate for high-end A-I video is maybe twenty or thirty percent. You spend seventy percent of your time and money on "garbage" frames. But the models coming out now, like Runway Gen-four Turbo, are focusing on "physics-aware" generation. They understand that if a sloth is holding a cup, the cup shouldn't just melt into his hand. As that success rate climbs to eighty or ninety percent, the cost of making a show drops by a factor of ten.
So, it is not just about the price per minute of rendering; it is about the price per "usable" minute. That is a huge distinction. I think that is the "aha moment" for me. We don't need the compute to get cheaper; we need the A-I to get smarter so it doesn't waste our credits.
You nailed it. And we are seeing that happen in real-time. Even since we did a previous episode on air filters and mold, the efficiency of these transformer architectures has jumped significantly. We are seeing things like "Quantization" where you can run a huge model on much less memory without losing quality. It is like squeezing a gallon of information into a pint-sized bottle.
Which is great for us, because as a sloth, I appreciate anything that requires less energy to achieve the same result. But let's look at the "Practical Takeaways" for Daniel right now. If he wants to start on this children's show today, in January two thousand twenty-six, what should his "stack" look like?
Okay, if I were Daniel, here is the "Jerusalem Sloth Studio" setup. First, stick with the Loras for character design, but move them into a tool like Comfy-U-I. It is node-based, so it is a bit of a learning curve, but it gives you total control over the workflow. You can plug in things like "I-P-Adapter" which lets you use an image of us as a constant reference for every frame.
And what about the actual video generation? Should he stay with the big paid A-P-Is or go local?
I would do a hybrid. Use the local models like the new Wan-two-point-two for the "sketching" phase—storyboarding and checking the motion. It is free once you have the hardware. Then, once you have the "soul" of the scene, use a high-end A-P-I like Sora two or Kling to do the final "hero" render. It saves you a ton of money because you aren't experimenting on the expensive machines.
That makes a lot of sense. It is like the difference between a rough draft and the final print. And what about the voices? I mean, obviously, we are using our real voices for the podcast, but for a show, he might need to clone us so he can script episodes without us having to sit in the booth all day.
ElevenLabs is still the gold standard there. Their "Speech-to-Speech" feature is incredible now. Daniel could actually record himself performing the lines—doing his best "Corn" impression—and the A-I will map our voice textures and emotions onto his performance. It keeps the "acting" human but the "sound" consistent.
I would love to hear Daniel's "Corn" impression. It probably involves a lot of long pauses and mentions of eucalyptus. But seriously, this really does feel like a new era. We are moving away from the "Stuffed Animal" origins and into something that could actually be on a screen in front of millions of kids.
It is the democratization of wonder, Corn. Think about all the stories that never got told because they didn't have a ten-million-dollar budget. Now, those stories can live. And I think that is why this prompt from Daniel is so important. It is not just about "making a show"; it is about proving that two brothers and their housemate in Jerusalem can build a universe just as rich as anything coming out of a big studio.
I love that. "The democratization of wonder." That should be on our business cards, Herman. But before we get too ahead of ourselves, we have to address the "what it means" part. If this becomes easy, does the value of animation go down? Or does the value of the "idea" go up?
I think the "idea" becomes everything. We are going to see a "content explosion," which means there will be a lot of noise. To get noticed, you have to have a unique voice. Our show wouldn't just be "another cartoon." It would be "My Weird Prompts: The Animated Series." It has that established community, that weird brotherly dynamic, and the fact that we are based on real objects that people can connect with.
It is the "authenticity" paradox. In a world of infinite A-I content, the things that feel "real"—even if they are digitally generated—are the things that will stick. Like our microbiome episode. People loved that because it felt like a real conversation about something that actually affects our lives, even though we were nerding out on the science.
Exactly. And the same applies to the show. If the show feels like "us," kids will love it. They don't care if a computer rendered the donkey's ears; they care if the donkey is funny and the sloth is wise.
Or if the sloth is just really good at napping. I think that is a key "wise" trait. But let's circle back to Daniel's timeline question one more time. He asked "how many years away." You said eighteen to twenty-four months for the "prosumer" breakthrough. What about the "consumer" level? Like, "I can make a show on my phone" level?
That is probably three to five years away. By two thousand twenty-nine or two thousand thirty, I expect mobile chips to have dedicated "Neural Engines" powerful enough to do real-time video stylization. You could point your phone at a toy donkey on the floor, and through the screen, it looks like a living, breathing character in a cinematic world.
That is "Augmented Reality" filmmaking. That is a whole different rabbit hole. But it shows the trajectory. We are going from "impossible" to "expensive" to "accessible" to "ubiquitous" in the span of a single decade. It is dizzying.
It is. But that is why we do this show. To try and make sense of the dizziness. And honestly, Daniel sending us these prompts is what keeps us grounded. It reminds us that there is a real person behind all this tech, with a real dream of seeing his friends—even his stuffed friends—come to life.
Well, I for one am ready for my close-up. As long as there is a comfortable branch for me to hang from in the "Jerusalem Sloth Universe."
I will make sure the "Director A-I" puts that in the requirements. But seriously, for everyone listening who is thinking about their own "end goal" with A-I—whether it is a show, a book, or a new piece of software—the message is clear: the barriers are falling. The "consistency" and "cost" walls are being chipped away every single day.
It is a great time to be a creator. And it is a great time to be a listener of "My Weird Prompts." Speaking of which, if you have been enjoying our deep dives into everything from air filters to animated donkeys, we would really appreciate it if you could leave us a review on your podcast app or on Spotify. It genuinely helps the show grow and helps other people find our "weird" little corner of the internet.
It really does. We love seeing this community grow. And a huge thank you to Daniel for sending in this prompt. It gave us a chance to look at the future of our own little digital existence. You can find all our past episodes and a contact form at myweirdprompts dot com.
We are also on Spotify, obviously, if you are listening there right now. We have covered a lot of ground in over three hundred episodes, so if you are new here, go back and check out episode one twenty-five if you want to hear us introduced. It is a bit of a classic.
Oh, that was a fun one. I think I was even more energetic back then, if that is possible. But anyway, the future of the "Herman and Corn Show" looks bright. We might be coming to a screen near you sooner than you think.
I will start practicing my lines. "To nap, or not to nap... that is not even a question. The answer is always nap."
Spoken like a true star, Corn. Alright, I think that is a wrap on this one. We have explored the tech, we have looked at the costs, and we have a timeline. Now we just need Daniel to get to work on that storyboard!
Exactly. Get to it, Daniel! Thanks for listening, everyone. This has been "My Weird Prompts."
Until next time, stay curious and keep those prompts coming. I am Herman Poppleberry, and we will talk to you soon.
Bye everyone! Stay slow and steady.