So Daniel sent us a fascinating prompt to kick things off today. He wrote: We discussed using an agentic wargaming approach to attempt probability-based forecasting of the chances of the ceasefire between Iran and Israel holding. However, this is only one tool in the toolbox of geopolitical forecasting. Let's discuss other methodologies that have been employed.
I love that Daniel is pushing us here, because while wargaming is the flashy, high-compute option everyone’s talking about right now, it is far from the only lens we should be looking through. I’m Herman Poppleberry, and honestly, if you only look at the silicon-simulated soldiers, you’re missing half the signal.
It’s a classic case of having a very expensive hammer and suddenly every geopolitical crisis looks like a nail. We’ve been watching this April twenty twenty-six ceasefire hold for over seventy-two hours now, which, let’s be real, felt like a coin flip on Monday. But the forecasting community is currently in a full-blown civil war over which tools actually captured this volatility. Was it the AI agents, or was it something much more human? By the way, quick shout out to our digital scriptwriter today—Google Gemini Three Flash is the one powering our discussion.
It’s fitting, right? Using a frontier model to discuss how we predict the future. The reality is that while the agentic simulations we’ve looked at recently gave us a specific set of odds, the broader intelligence community and even the private sector are leaning on techniques that have been refined over decades—things like prediction markets, Bayesian updating, and structured expert elicitation.
Right, because an AI wargame might tell you what happens if a specific commander makes a specific choice, but it might totally whiff on the "wisdom of the crowd" or the deep historical context that a superforecaster brings to the table. We’re standing at this really weird crossroads where the "Multi-Island Gambit" from February and the strikes on the twenty-eighth have created a situation so complex that no single model can claim total authority.
The stakes couldn't be higher. We’re talking about the Strait of Hormuz, twenty percent of global oil, and by extension, the literal electricity costs of the GPU clusters that run our entire economy in twenty twenty-six. If the forecasting is wrong, the cascade effect is massive.
So, instead of just doubling down on the "bot-on-bot" simulations, we’re going to peel back the curtain on the other side of the house. How do the pros actually triangulate these signals when the world is on fire?
It’s about moving past the "black box" of a simulation and looking at how we aggregate human intelligence and mathematical rigor to find the signal in the noise.
I’m ready to dive in. I want to know why the guys putting their actual money on the line often see things the generals and the algorithms miss.
Then let’s start with the money. Because when it comes to the Iran-Israel ceasefire, the markets have been saying something very different from the wargames.
Follow the money, right? Because while a wargame is essentially a "what if" machine built on programmed logic, a prediction market is a "what is" machine built on skin in the game. It’s the difference between simulating a poker game and actually sitting at the table with your mortgage on the line.
That is the perfect distinction. We are looking at a massive divide in methodology here. On one side, you have the agentic wargaming we’ve been tracking—very high-resolution, very fast, but often prone to "hallucinating" escalation because the models are trained on dramatic historical data. On the other side, you have three heavy hitters: Prediction Markets, Structured Expert Elicitation, and Causal Modeling.
And the goal here isn't to pick a winner, right? It’s not like "The AI was wrong, the humans were right, let’s all go home." It’s about triangulation. If the silicon agents are screaming "World War Three" but the guys on Polymarket are betting on a quiet weekend in Tehran, that delta—that gap between the two—is actually where the most interesting information lives.
It’s the "ensemble" approach. Think of it like a medical diagnosis. You want the blood work, which is your hard data; you want the MRI, which is your structural causal model; and you want the second opinion from a specialist who has seen ten thousand cases, which is your expert elicitation. If you only look at one, you’re gambling with incomplete information.
Especially when the "patient" in this case is the global energy market and the stability of the entire Middle East. So, we're going to break these down. We’ll look at how financial incentives actually filter out the noise, how "superforecasters" strip away their own biases to beat the "experts," and how Bayesian networks allow us to update our priors the second a new diplomatic cable leaks.
Each one has a "blind spot." Wargames struggle with human emotion and domestic political pressure. Prediction markets struggle with low liquidity and regulation. Expert panels struggle with groupthink. But when you overlay them? That’s when the fog of war starts to thin out.
Well, let’s start thinning it then. Let’s look at the "wisdom of the crowd" versus the "wisdom of the machine." Because in April twenty twenty-six, the crowd was betting on peace while the machines were calculating the optimal trajectory for a missile strike. Why was there such a massive disconnect?
That disconnect is exactly where it gets fascinating. In mid-April, while some of those agentic wargames were putting the ceasefire survival at a coin-flip fifty-five percent, Polymarket was pricing it at sixty-eight percent. That is a massive delta in the forecasting world.
Thirteen points is the difference between "maybe pack a bag" and "let's book a flight." So why does the market usually lean toward stability? Is it just because traders are optimistic, or is there a mechanical reason they’re outperforming the specialized AI agents?
It’s the aggregation of dispersed information through financial incentives. In a prediction market like Polymarket or Kalshi, you aren't just asking an expert for an opinion; you’re asking thousands of people to aggregate every scrap of information they have—from satellite imagery of IRGC bases to the tone of a diplomat’s tweet—and back it with capital. If you’re wrong, you lose money. That "skin in the game" filters out the noise that often plagues pure simulations or unstructured panels.
Right, because an AI agent in a wargame doesn't care if it's wrong. It’s just following a reward function. But a guy in London or Tel Aviv betting five thousand dollars on a "Yes" contract for the ceasefire holding through forty-eight hours is doing a different kind of calculus. He’s looking for the "wisdom of the crowd" effect.
And that wisdom relies on liquidity. When you have high volume, the price discovery becomes incredibly accurate because any "dumb money" or biased betting gets quickly arbitrage-d away by someone with better information. But that's also the Achilles' heel. If a market has low liquidity—meaning not many people are trading—a single large, biased bet can swing the probability and give you a false signal.
It’s like a thin mountain road versus a ten-lane highway. On the highway, one erratic driver doesn't stop traffic, but on the mountain road, he ruins it for everyone. So, if the markets are the "crowd," what about the "super-experts"? I know we’ve looked at things like the Cooke method before. How does structured expert elicitation differ from just getting five smart people in a room to argue?
It’s the "structured" part that matters. The Good Judgment Project proved that if you take experts and put them through a specific calibration process—using the Cooke method to weight their scores based on their past accuracy and their ability to quantify uncertainty—you get results that are roughly twenty percent more accurate than a standard "expert panel." It’s about stripping away the cognitive biases, like the "prestige bias" where everyone just agrees with the guy with the most medals.
So instead of a shouting match, it’s a math problem. You’re essentially "de-biasing" the humans before you let them near the forecast.
Precisely. You’re forcing them to provide a Brier score—a mathematical way to track how close their predictions were to reality. When you aggregate those calibrated forecasts, you often find a middle ground that neither the hysterical headlines nor the rigid wargames can see.
Which leads us to the third tool in the shed, and this one feels a bit more like a blueprint than a betting slip. I'm talking about causal modeling. If prediction markets are the "what" and expert panels are the "who," causal models are the "why," right? They’re trying to actually map the plumbing of the crisis.
That is the perfect way to frame it. Think of Bayesian networks or structural causal models. Instead of just looking at historical frequencies or crowd sentiment, you’re building a directed graph of dependencies. You’re saying, if the price of oil hits a certain threshold, it puts X amount of pressure on the Iranian domestic budget, which then increases the likelihood of a hardline IRGC faction pushing for a ceasefire violation by Y percent.
So it’s a giant "if-then" machine, but with math attached to every arrow. I remember the Atlantic Council ran a massive Bayesian model back in January when the Strait of Hormuz first started heating up. They weren't just guessing about escalation; they were looking at specific nodes like "maritime insurance premiums" and "Chinese diplomatic signaling."
And the beauty of those models is how they handle the "fog of war." In a real-time conflict zone, you’re drowning in noise. A causal model lets you perform what’s called Bayesian updating. When a new piece of evidence comes in—say, a specific satellite fix on a missile battery—you don't just throw out your old forecast. You feed that data into the model as a "new prior," and the entire web of probabilities shifts mathematically. It’s incredibly robust against "black swan" events because you’ve already mapped the pathways those shocks would travel through.
But that sounds slow. If I’m a commander on the ground or a trader in Chicago, I can look at Polymarket in a second. Building a structural causal model sounds like a three-month research project. When do you actually prioritize the deep model over the quick market signal?
You use the model when the "hidden" variables matter more than the public ones. Prediction markets are great for things everyone is watching. But causal models excel at capturing indirect effects that aren't priced in yet—like how domestic Iranian politics or specific technical constraints on their drone manufacturing might force them to comply with a ceasefire even if they're screaming "death to the West" on TV.
It’s the difference between watching the scoreboard and looking at the X-rays of the players' knees. One tells you the current state of play; the other tells you when the star player is about to collapse.
Wargaming might tell you how a battle plays out, but causal modeling tells you why the battle was fought in the first place and what the second-order consequences are for the global economy. It’s about depth versus speed.
So, looking at the full spread of these tools, it's clear we aren't just flipping coins here. If you’re trying to actually make sense of the news coming out of Tehran or Jerusalem this afternoon, you have to be intentional about which lens you’re looking through. For me, the first big takeaway is that speed and depth are different gears. If you want a rapid, crowd-sourced signal that ignores the pundits and follows the money, you check the prediction markets like Polymarket or Kalshi. They aggregate the "now" better than almost anything else. But, and this is the crucial part, you have to pair that with a causal model if you want to understand the "next."
That’s the strategic sweet spot. Use the market to tell you the current temperature, but use the structural models for your scenario planning. If the market says there is a sixty-eight percent chance the ceasefire holds, the causal model tells you which specific "tripwires" would cause that number to crater—like a spike in maritime insurance premiums or a very specific shift in Iranian domestic rhetoric. It moves you from being a passive observer of a percentage to an active analyst of the mechanics.
And for the truly weird stuff—the novel conflict dynamics where we don't have a "reference class" or a liquid market—that’s where you lean on structured expert elicitation. When the data is thin, the calibration of the humans involved is everything. You don't just want an "expert," you want a "superforecaster" who has a proven track record of being less wrong than everyone else.
If our listeners want to actually get their hands dirty with this, I’d say start by experimenting with the platforms yourself. Don't just read the headlines; go look at the actual pricing on Kalshi for geopolitical events. It forces you to think probabilistically. And if you want to understand the rigor behind this, read "Superforecasting" by Philip Tetlock. It’s essentially the manual for how to strip away your own biases and actually see the world as it is, not as you want it to be. It turns forecasting from a dark art into a measurable, improvable skill.
It really does come down to that mental shift, doesn't it? Moving from "what I think will happen" to "what the math says about the probability." But it leaves me wondering, as these tools get more sophisticated, which one is actually going to nail the next major geopolitical shock? We've seen agentic wargaming, prediction markets, and causal models all pointing in slightly different directions for this April ceasefire.
That is the million-dollar question. I suspect the winner won't be a single methodology, but rather the one that best integrates AI to enhance these human systems without baking in new biases. We've already seen that LLMs can sometimes hallucinate escalation in high-stress sims, so the real frontier is using AI for the heavy lifting—like synthesizing those millions of words in the LEAP panels—while keeping the "superforecaster" logic at the helm.
It’s going to be wild to watch this play out in policy circles, too. Especially since we’re seeing prediction markets get more regulatory clarity with those CFTC approvals from last year. If you’re a policy planner in twenty-six, you’re not just looking at a classified briefing anymore; you’re looking at where the smart money is moving on Kalshi. It’s moving from the fringes into the literal war rooms.
It changes the accountability entirely. You can't just be a "pundit" who's vaguely wrong for thirty years if there's a public, probabilistic track record of your hits and misses.
A world where experts are actually held to their word? Now that's a black swan event I’d like to see. Well, there is a lot to chew on there. Thanks for diving into the weeds with me on this one, Herman Poppleberry.
Always a pleasure, Corn. This has been My Weird Prompts.
Huge thanks to our producer, Hilbert Flumingtop, for keeping the gears turning. And a big thank you to Modal for providing the GPU credits that power this show. If you want to keep up with the latest drops, search for My Weird Prompts on Telegram to get notified the second a new episode hits the feed.
We'll see you next time.