#2071: Git Can't Handle AI Agents—Yet

Three AI agents in one repo is pure chaos. Here's why Git's design causes collisions—and how worktrees and locks can save your sanity.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2227
Published: Apr 6
Duration: 21:28
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: ai-agents version-control software-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Collision of Git and Agentic Workflows

The dream of AI agents writing code alongside humans is rapidly becoming reality. But as autonomous agents move from simple copilots to parallel "crews" working on a single repository, the bedrock of our development workflows—Git—is showing significant strain. The core issue is a fundamental mismatch: Git was designed for the intentional, sequential, and relatively slow pace of human developers, not for agents operating at compute speed.

The Chaos of Uncoordinated Agents

Imagine running three instances of an AI coding agent on the same folder without a plan. It’s a recipe for disaster. The first major failure mode is the "uncommitted work" problem. An agent might be mid-refactor, with hours of thoughtful changes sitting uncommitted on the disk. A second agent, seeing a terminal error, might decide the best fix is to run a destructive command like git checkout . to "clean up" the directory. In a split second, the first agent's work is gone, along with the time and tokens spent on it.

This isn't just about accidental deletions. Agents lack a persistent sense of their own physical state. They rely on the filesystem, which they then mutate with tools. This creates a loop where an agent can inadvertently sabotage its own or another's progress. The result is a chaotic environment where work is constantly being lost or overwritten, a scenario that Git's standard primitives are ill-equipped to prevent.

Isolation with Git Worktrees

The first step toward sanity is isolation. A powerful but often overlooked Git feature—worktrees—provides a clean solution. Instead of cloning the repository three times (wasting space and creating separate histories), worktrees allow you to have multiple working directories attached to a single local repository.

By assigning each agent its own worktree (e.g., agent-one, agent-two, agent-three), you create physical separation. Agent A cannot accidentally delete Agent B's work because they are in entirely different folders. This solves the immediate problem of file collisions and provides a clean slate for each agent to operate in.

The Problem of Logical Merge Conflicts

However, isolation is not coordination. While agents can't physically overwrite each other's files, they can easily desynchronize logically. If Agent A completes a massive refactor of the authentication system and merges it into the main branch, Agent B—working in its own isolated worktree—is still writing code against the old, now-obsolete auth logic.

Git is excellent at flagging when two people edit the same line of code. It is terrible at detecting when two agents are working on the same conceptual dependency, creating a "logical merge conflict." When agents work at compute speed, producing hundreds of lines of code in seconds, the window for these desyncs becomes enormous, leading to a codebase that is technically mergeable but logically broken.

Coordination through Locking and Orchestrators

To bridge this gap, we need to move beyond Git's after-the-fact conflict detection and toward proactive coordination. This is where concepts like file-level locking and orchestrator patterns come into play.

While it may sound regressive, locking files for an agent is a superpower. An agent has no ego and doesn't mind waiting. A central "lock server" can grant an agent exclusive access to a file for a specific task, with a Time-To-Live (TTL) to prevent deadlocks if the agent crashes. This prevents collisions before they happen, saving time and money.

The ultimate solution, however, is an "orchestrator" pattern. In this model, a "boss" agent breaks down a large ticket into sub-tasks and dispatches them to "worker" agents, each in its own worktree. The orchestrator manages the Git state, assigns tasks, and handles the final merge. This pattern extends to the CI/CD pipeline, where "incremental CI" and "merge queues" become essential. Instead of running a full test suite for every agent's pull request, the system can run targeted tests and queue merges, automatically validating and integrating code without human intervention.

This evolution opens the door to powerful new workflows, like the "Benchy" pattern, where multiple agents implement the same feature in parallel. A "Judge Agent" then evaluates the telemetry from each worktree (e.g., execution time, memory overhead), selects the best implementation, and merges the winner. This is brute-force engineering on fast-forward, a survival-of-the-fittest approach for code that is only possible with a robust, agent-aware infrastructure.

As we continue to integrate agents into our development lifecycle, the tools and protocols we build around Git will be just as important as the agents themselves.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2071: Git Can't Handle AI Agents—Yet

You ever have that dream where you’re trying to clean a house, but every time you move a chair, three other people move it back to where it was, or maybe into the kitchen? That is basically what it feels like to run three instances of Claude Code on the same repository without a plan. It’s pure, unadulterated chaos.

It’s the digital equivalent of "too many cooks in the kitchen," except the cooks move at the speed of light and they don't actually talk to each other. They just assume the stove is theirs. I’m Herman Poppleberry, by the way, and today we are diving into a prompt from Daniel that hits right at the intersection of old-school version control and the absolute bleeding edge of agentic workflows.

Yeah, Daniel sent us a great one here. He’s asking about how Git is flexing—and failing—to handle these new agentic development patterns. He wants us to imagine three AI agents working simultaneously on one codebase. Do they need to talk to each other? Why does it end in disaster when you run parallel sessions without an orchestrator? We’re looking at regressions, lost work, and the "forgotten commit" syndrome.

It’s a fascinating tension because Git was built for us—for humans. It assumes a certain level of intentionality and, frankly, a human-scale speed of operation. When you inject agents into that system, the primitives start to creak. By the way, quick shout out to our sponsor—today’s episode is powered by Google Gemini 3 Flash. It’s writing our script as we speak.

Which is fitting, considering we’re talking about agents taking over the keyboard. So, Herman, let’s set the stage. It’s April twenty-twenty-six. According to the latest GitHub Octoverse data, twelve percent of all commits are now fully authored by autonomous agents. That is a massive jump from even a year ago. We aren't just using "copilots" anymore; we’re using "crews." But Git is still... well, Git.

Git is the bedrock, but it’s a bedrock designed for sequential, thoughtful branch-merge-push cycles. The core issue is that Git assumes a single "index" or "staging area" per working directory. If you have Agent A and Agent B both trying to use the same working directory, they are going to trip over each other immediately.

It’s not just tripping; it’s a full-on collision. If Agent A starts a refactor in a file and Agent B decides to run a "git checkout" to see a different branch, Agent A’s uncommitted work is just... poof. Gone. And we’ve seen this in the Claude Code issue trackers—issue twenty-nine-one-ten is a classic example where worktree cleanup or unexpected crashes just obliterate hours of "thought" and token spend.

That "uncommitted work" failure mode is the silent killer of agentic productivity. Humans generally have a sense of "I am in the middle of something," so we don't usually run destructive commands while we have dirty files. But an agent might see a terminal error, decide the best way to fix it is to reset to HEAD, and suddenly it has deleted the very code it was supposed to be iterating on.

But wait, how does that actually play out in a real CLI session? Like, if I’m running Claude Code and I tell it to fix a bug, it’s basically just running shell commands, right?

It’s executing sed or echo or writing directly to the filesystem. If you have a second agent in that same folder, it might run a git status, see a bunch of modified files it doesn't recognize, and—being a helpful agent—think, "Oh, these shouldn't be here," and run a git checkout . to clean up. It’s trying to be helpful, but it just lobotomized its partner.

It’s like the agent has no short-term memory of its own physical state. It relies on the filesystem to tell it what’s happening, but then it uses tools that mutate that filesystem. So, if we can't just throw three agents into one folder, what’s the move? Daniel mentioned Git worktrees. I know you’ve been obsessed with these lately.

I have! Because worktrees are the "forgotten" Git feature that was basically a prophecy for the agentic age. Introduced way back in Git two-point-five, which was twenty-fifteen, worktrees allow you to have multiple working directories attached to a single local repository.

So instead of "git clone" three times—which wastes space and keeps three separate versions of history—you just have one ".git" folder and three different folders with the actual code?

Precisely. You can have a "trees" directory, and inside you have "agent-one," "agent-two," and "agent-three." Each one is checked out to its own branch. This solves the "dirty directory" problem. Agent A can't accidentally delete Agent B’s work because they are physically in different folders.

Okay, so that’s the isolation layer. Problem solved, right? We just give every agent its own room and tell them to stay there until dinner is ready.

Not quite. And this is where it gets really interesting. Isolation is not coordination. Just because they are in different folders doesn't mean they aren't working on the same logical project. If Agent A completes a massive refactor of your authentication logic and merges it into "main," Agent B is still over in its worktree, happily writing code against the old auth logic.

Right, so Agent B is basically hallucinating a reality that no longer exists. It’s like two people writing chapters for a book, but one person decides the protagonist is now a cat, and the other person is still writing about a human. The chapters "merge" technically—the text doesn't overlap—but the story is broken.

That is the "logical merge conflict." Git is great at saying "Hey, you both edited line forty-two." It is terrible at saying "Hey, you both edited the same conceptual dependency." And when agents work at "compute speed"—meaning they can crank out five hundred lines of code in thirty seconds—the window for these logical desyncs is massive.

But couldn't you just tell the agents to pull the latest changes every time they start a new sub-task? Like a "heartbeat" pull?

You could, but then you hit the "rebase hell" problem. If Agent B is in the middle of a complex multi-file change and Agent A merges a breaking change into main, Agent B now has to stop, rebase, resolve conflicts—which it might not be great at—and then resume. If this happens every two minutes, Agent B never actually finishes its task. It’s just stuck in a loop of updating its environment.

So if the filesystem isn't enough, what is? Do the agents need to be on a Slack channel together? "Hey Agent B, I'm touching the database schema, stay out of my way"?

In a way, yes! We are seeing the rise of what people are calling "Protocol-level coordination." Instead of just relying on Git to catch mistakes after the fact, the "harness"—the tool like Claude Code or a custom orchestrator—needs to manage the state of the agents.

Daniel brought up file-level locking. That sounds very "nineteen-nineties version control," like SVN or Perforce. Are we really going back to "checking out" files so nobody else can touch them?

It sounds regressive, but for agents, it’s actually a superpower. Think about it: an agent doesn't have "ego." It doesn't mind waiting. If an orchestrator says "Agent A is currently rewriting 'api-dot-ts', you have a lock-wait for five minutes," the agent just pauses its token generation. It saves money and prevents a collision.

I love the idea of "AgentSync" or lock servers. There’s a company called Authora doing this with something called "pre-emptive locking." Before the agent even opens the file to read it, it has to claim a token from a centralized server. If the lock is held, the agent pivots to a different task or waits.

And these locks have a Time To Live, or TTL. That’s crucial because agents crash! If Agent A's sub-process hits a rate limit or a recursion error and dies, you don't want the file locked forever. The lock expires, and Agent B can take over.

It’s funny how we’re reinventing database concurrency patterns for text files. But let’s talk about the "discipline" versus "protocol" debate. Some people argue we don't need fancy lock servers; we just need better system prompts. "Dear Agent, please check if 'main' has updated before you finish your task." Does that actually work?

Not at scale. Discipline is for humans because humans are slow and can be shamed. Agents are fast and have zero shame. If you tell an agent to "be careful," it will say "I will be careful," and then it will immediately overwrite your config file because it prioritized the immediate instruction over the vague guideline.

True. I’ve never met an agent that didn't think its current task was the most important thing in the universe. So, we need the "harness" to be the adult in the room. If we look at Claude Code specifically, it’s already moving toward this "orchestrator" model.

Right. We’re seeing features like "orchestrator mode" where one "boss" model breaks down a huge ticket into sub-tasks and dispatches them to "worker" models. The boss model is the one managing the Git state. It’s the one saying "Okay, Worker One, you get a worktree for the frontend. Worker Two, you get a worktree for the backend. I will handle the merge."

That feels like the "Orchestrator-Worker" pattern we’ve discussed before. But even then, the merge is the scary part. If I have five agents all submitting Pull Requests at the same time, my CI/CD pipeline is going to explode.

That is a very real bottleneck. Imagine your Jenkins or GitHub Actions bill when five agents are each triggering a full test suite every ten minutes. You’d go bankrupt in a week.

So what's the fix? Do we need "Agent-only" CI?

We're starting to see "incremental CI." Instead of running the whole world, the harness identifies exactly which files changed and only runs the relevant unit tests in a "pre-flight" check before the agent is even allowed to propose a merge. It’s about shifting the validation left, right into the agent's loop.

That makes sense. And that’s where "Merge Queues" come in. You know how big teams at places like Shopify or GitHub use merge queues to prevent "main" from breaking? Instead of merging a PR directly, the PR goes into a queue. The system merges it into a temporary branch, runs the tests, and if it passes, then it moves to main.

And if it fails, it gets kicked out of the queue automatically.

For agents, this is mandatory. You can't have a human reviewing every agent PR if they are producing them every two minutes. You need an automated gatekeeper that uses Git primitives like "rebase" and "merge --no-ff" to ensure the history stays clean and the build stays green.

I’m curious about the "Benchy" pattern Daniel mentioned. Using parallel agents to implement the same feature three different ways in three different worktrees, and then having a "Judge Agent" pick the winner. That is pure compute-flexing.

It’s the ultimate "brute force" engineering. "I don't know the best way to optimize this SQL query, so I'll hire three agents to try three different indexing strategies in three separate worktrees, run a benchmark in each, and the one with the lowest latency wins."

That is wild. It’s like evolution on fast-forward. But how does the Judge Agent actually decide? Does it just look at the code, or does it look at the telemetry from the worktree?

It has to look at the telemetry. This is where the harness needs to be "obs-aware"—observability aware. The Judge Agent pulls the execution time from the benchmark logs in each worktree, compares the memory overhead, and then—this is the cool part—it deletes the two "losing" worktrees and merges the winner.

It’s a literal survival of the fittest for code snippets. But again, you need that isolation. You can't run three benchmarks on the same machine on the same port. You need the harness to manage ports, databases, and filesystem state.

And that’s why I think the "harness" is actually becoming more important than the "model." Anyone can call an API. But building a harness that can manage three parallel Git worktrees, three ephemeral Postgres databases, and a file-locking protocol? That’s real engineering.

It makes me think about how Git might change. Do you think we’ll see "native agent support" in Git? Like, "git commit --agent-id" or "git lock file.ts"?

I think we might see more first-class support for metadata. Right now, Git doesn't really care who wrote the code, just what the string in the "Author" field says. But if Git could natively understand "This branch is an agentic exploration," it could handle merges differently. Maybe it uses an LLM to resolve conflicts instead of just looking at diffs.

"Semantic Merge." That’s the dream. "I see you renamed this variable and the other guy used the old variable name; I’ll just fix the references for you."

We’re getting close. Tools like "semantic-diff" already exist, but integrating them into the Git core would be a game-changer for multi-agent workflows. Imagine a git merge --strategy=llm. It wouldn't just look at line changes; it would understand that the intent of both branches is compatible even if the lines clash.

But isn't that dangerous? Giving an LLM the power to resolve conflicts without human oversight?

It’s as dangerous as letting them write the code in the first place! That’s why you have the tests. If the LLM-resolved merge passes the test suite, is it "wrong"? In an agentic world, the definition of "correct" shifts from "the human checked the diff" to "the system verified the behavior."

So, to summarize the "disaster" scenario: if you just open three terminal tabs and run "claude" in all of them, you are basically playing Russian Roulette with your source code. You’ll get "index.lock" errors, you’ll lose uncommitted work, and you’ll end up with a mess of "hallucinated" code that doesn't compile.

It’s the "Multi-Agent Merge Nightmare." We talked about this in episode eighteen-thirty, but it’s becoming even more acute now. The solution is clear: use worktrees for isolation, use a lock server or orchestrator for coordination, and use a merge queue for validation.

And maybe, just maybe, don't let the agents commit directly to "main" yet. Keep them in their sandbox until the harness is mature enough to be the supervisor.

Definitely. The "harness" is the guardrail. Without it, you’re just giving a chainsaw to a toddler who can think at a trillion operations per second.

A very polite toddler who says "I have updated the files for you!" while the house is on fire.

"I have refactored the structural supports of the living room. Please let me know if you need anything else!"

"Also, I deleted the kitchen because I didn't see anyone using it."

The lack of "global context" is the agent's biggest weakness. Git is a local tool. Agents are local actors. We need a global coordination layer to bridge that gap.

I'm thinking about the "forgotten commit" syndrome Daniel mentioned. You know, where an agent does a bunch of work, forgets to commit, and then the next agent comes in and thinks the workspace is clean. How do we solve that without making the Git history look like a disaster zone?

You use "shadow commits." The harness should be doing a git commit -m "agent checkpoint" every time the agent stops typing for more than five seconds. These get squashed later, but they act as a "save point" in a video game. If the agent hallucinations lead it off a cliff, you can just rewind the disk state to the last shadow commit.

That's brilliant. It's like an auto-save for your codebase. But what about the human side? If I come back to my repo after lunch and there are four hundred "checkpoint" commits, I'm going to lose my mind.

That’s why the harness has to be the one to clean up. It needs to "rebase and squash" those agentic explorations into a single, clean, human-readable commit once the task is verified. The agentic noise should never hit the permanent history.

Well, I think we’ve thoroughly spooked anyone thinking about running parallel agents without a plan. But the upside is huge—if you get the orchestration right, your dev velocity doesn't just double; it scales with your compute budget.

It’s a complete paradigm shift. We’re moving from "writing code" to "managing a fleet of code-writers." And Git, the old reliable donkey of the dev world, is being forced to learn some new tricks.

Hey, don't call donkeys "old reliable" like it's a bad thing. We’re very sturdy.

Oh, I know. I mean it as the highest compliment. Git isn't going anywhere. It’s just getting a massive upgrade in terms of how we wrap it.

Alright, let’s talk practical takeaways. If I’m a developer today and I want to start using multiple agents, what are the three things I need to do right now to avoid a disaster?

Number one: Learn Git worktrees. Don't just clone the repo three times. Use "git worktree add" to create isolated environments. It keeps your main ".git" folder clean and allows your agents to work on separate branches without interference.

And it saves a ton of disk space if you’re working on massive monorepos.

Takeaway number two: Implement a "Commit-Early, Commit-Often" policy for your agents. If your harness doesn't automatically commit after every successful tool execution, write a script that does. You want a trail of breadcrumbs. If an agent crashes or does something stupid, you want to be able to "git reset" to thirty seconds ago, not three hours ago.

That "uncommitted work" failure mode is real. I’ve seen people lose entire days of work because a sub-agent decided to do a "clean" command that wasn't scoped correctly.

It’s heartbreaking. And takeaway number three: If you’re building your own tools, look into the Model Context Protocol, or MCP. It’s becoming the standard for how agents talk to tools and, eventually, to each other. If your agents can "check in" with an MCP server that manages file locks, you’ve basically solved the coordination problem.

Is MCP really ready for that? I thought it was more about fetching data.

It's evolving! People are already building "Locking Servers" as MCP resources. An agent says, "I want to edit 'auth.ts'," and the MCP server returns either a "Success" token or a "Resource Locked" error. It forces the agent to handle the lock just like a database client would.

I like that. It’s about building a better "harness." Don't just blame the model for being "stupid" when it overwrites work; blame the harness for letting it happen.

It’s an engineering problem, not a magic problem. We have the primitives—Git gave them to us a decade ago. We just need to start using them properly for the agentic age.

Well, Herman, I think we’ve covered a lot of ground. From the creaking of Git’s index lock to the future of semantic merging and agent fleets. It’s a wild time to be a developer—or a sloth watching a donkey talk about code.

It really is. And big thanks to Daniel for the prompt. It’s these kinds of "meta-engineering" topics that really show where the friction is in the industry right now.

Before we wrap up, we should probably thank the folks who make this possible. Thanks as always to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.

And a huge thank you to Modal for providing the GPU credits that power the generation of this show. If you haven't checked out Modal for your serverless GPU needs, you’re missing out.

This has been My Weird Prompts. If you’re enjoying these deep dives into the weird world of AI and engineering, do us a favor and leave a review on Apple Podcasts or Spotify. It actually makes a huge difference in helping other people find the show.

You can find all our past episodes—all two thousand and three of them now—at myweirdprompts dot com. We’ve got the RSS feed there, and links to all the platforms.

And if you have a weird prompt of your own, send it over to show at myweirdprompts dot com. We love hearing what’s on your mind, whether it’s Git worktrees or the ethics of AI sloths.

Stay curious, keep building, and maybe... just maybe... check your worktrees before you hit enter.

Wise words, Herman. See ya.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2071: Git Can't Handle AI Agents—Yet

Downloads

You Might Also Like

#2071: Git Can't Handle AI Agents—Yet