#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness

Explore the evolution from local LLMs to modern agentic CLIs, focusing on the "harness" that gives models context, tools, and autonomy.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-1908
Published: Mar 29
Duration: 24:53
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: local-ai ai-agents rag

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Evolution of AI in Development: From Raw Models to Agentic CLIs

Not long ago, running a large language model locally felt like a superpower. Tools like Ollama brought models such as Llama 2 directly to your laptop, promising a new era of private, offline AI assistance. However, the initial experience often fell short of this promise. Developers frequently found themselves acting as manual couriers, copy-pasting code snippets between their terminal and the model, struggling with context limits, and manually fixing indentation or hallucinated libraries. The gap between the model's knowledge and the actual codebase was vast.

This episode explores that specific gap and the architectural shifts that have bridged it. The core discussion centers on the "harness"—the layer of software that surrounds a raw model to make it genuinely useful for complex development tasks. While the underlying models have improved, the most significant leap in productivity comes from how these models are integrated into the development environment.

The Three Pillars of the Modern Agentic CLI

The transformation from a passive "brain in a vat" to an active coding partner rests on several key components added to the raw model:

Autonomous Contextual Awareness: Early models suffered from tiny context windows, often limited to a few thousand tokens. This made it impossible to understand an entire repository. Modern agentic CLIs solve this by running background processes that create semantic indexes of the codebase. Using tools like tree-sitter, they understand the abstract syntax tree, mapping function calls and dependencies across files. When a developer asks a question, the harness—not the model—determines which code snippets are relevant and feeds them into the prompt, acting as a librarian and map for the model.
Tool-Use Orchestration and Feedback Loops: In 2023, a model could suggest a bash command but couldn't execute it. The developer had to run it, capture the output, and feed any errors back to the model. This created a slow, error-prone feedback loop. Modern CLIs act as a privileged intermediary. The model can request to run a command (e.g., npm test), the CLI executes it, and the resulting output—success, failure, error logs—is automatically fed back into the model's context. This enables a goal-seeking behavior where the agent can iteratively test, fail, and refine its own solutions, moving from simple instruction-following to autonomous problem-solving.
Persistent State and Planning: Raw models are stateless and often "vibes-based," eager to provide an immediate answer. This leads to bloated prompts where developers must re-explain the entire project state with every interaction. Agentic CLIs maintain persistent state, tracking changes, current plans, and the file system's condition. They also enforce a planning layer, requiring the model to generate a multi-step execution plan before modifying any files. This structured approach, akin to a responsible adult checking instructions before building furniture, ensures more reliable and comprehensive refactors.

Why the Terminal? The Scope of Authority

While IDEs with AI sidebars are powerful, the terminal has reclaimed its地位 as the preferred interface for AI agents. This isn't nostalgia for green text on a black screen; it's about the terminal's unparalleled "scope of authority." The terminal is the universal API to the operating system. From there, an AI can interact with Docker, AWS CLI, Git, log files, and CI/CD pipelines. An IDE sidebar is often confined to the open files or project folder, but a terminal-based agent can manage the entire development lifecycle, from local debugging to deployment.

Safety, Reliability, and the Human-in-the-Loop

With great power comes great responsibility. Modern agentic CLIs incorporate crucial safety rails. Because the model can now execute commands, the harness intercepts potentially destructive actions (like rm -rf /) and prompts the human for approval. This "human-in-the-loop" workflow ensures that the developer retains ultimate control, blending autonomy with necessary oversight.

Conclusion: The Harness is the Value

Stripping away the harness and returning to a raw model would eliminate the vast majority of the productivity gains. The model itself is a powerful encyclopedia, but the harness provides the hands, eyes, and environment needed to perform actual work. It transforms the terminal from a passive interface into the central nervous system of the machine, enabling developers to delegate complex, multi-file tasks with confidence. The future of AI-assisted development isn't just about smarter models; it's about smarter, more integrated systems that understand context, execute actions, and learn from feedback.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness

Imagine it is 2023. You have just downloaded Ollama. You are feeling like a digital god because you have a large language model running locally on your laptop. You open up your terminal, you type in a prompt to help you refactor a React component, and then... nothing. Or rather, a wall of text that you have to manually copy-paste, fix the indentation on, and pray that the model actually saw the other three files it needs to understand the context. It was a nightmare. Today's prompt from Daniel is about that specific gap—the distance between those raw instructional models we had a few years ago and the modern agentic CLIs like Claude Code that have basically taken over the terminal.

It is a fascinating historical reversal, Corn. Usually, in computing, the CLI comes first, then the GUI makes it accessible. But with AI, we went from raw models to web chat interfaces, then to integrated development environment sidebars, and only recently have we circled back to the terminal as the ultimate seat of power. And by the way, speaking of modern models, today's episode is actually powered by Google Gemini 3 Flash. It is helping us bridge that gap between the 2023 struggles and the 2026 reality.

I remember those 2023 struggles vividly. I’d spend forty minutes trying to get a local model to understand a file structure, only for it to hallucinate a library that didn't exist in my package dot json. Herman Poppleberry, you have been digging into the "harness" that makes the difference here. Why did we have to wait so long for the terminal to become the "cool" place for AI again?

The short answer is that a raw model is just a brain in a vat. In 2023, when Ollama was released, it gave us the brain—the ability to run Llama 2 or similar models locally—but it didn't give the brain any hands. It didn't give it eyes to see the file system, and it certainly didn't give it the authority to actually do anything. If you tried to develop a repository using just a raw instructional model back then, you were basically acting as a manual courier, carrying messages back and forth between the model and your disk.

I was the intern for my own AI. I was the one doing the "agentic" work of moving the text. So, if we look at tools like Claude Code today, what is actually being "bolted on" to that brain? Because the models themselves, while better, aren't fundamentally doing something different than predicting the next token. It’s the stuff around the model that changed, right?

Precisely—well, not "exactly," but you're hitting on the core architectural shift. There are three or four massive components that modern agentic CLIs add to a raw model. The first is what I call "Autonomous Contextual Awareness." In 2023, the context window for a model was tiny. If you were lucky, you had eight thousand tokens. That is barely enough to hold a couple of complex source files and the prompt instructions. You couldn't just say "look at the whole repo."

Right, and even if you could fit it, the model would get "lost in the middle." You’d give it twenty files and it would forget the first ten by the time it started writing the code.

The modern harness solves this through indexing. When you fire up a tool like Claude Code or a similar agentic CLI, it doesn't just wait for you to type. It runs a background process to create a semantic index of your entire repository. It uses things like tree-sitter to actually understand the abstract syntax tree of your code. It knows that "Function A" in "File X" is called by "Component B" in "File Y." So when you ask a question, the harness—not the model—is doing the heavy lifting of finding the relevant snippets and feeding them into the prompt.

So it’s like giving the model a library card and a map instead of just whispering a story to it in a dark room. But that’s just the "seeing" part. What about the "doing" part? Because that's where the 2023 workflow really fell apart. I’d get the code, I’d paste it in, and the thing wouldn't even compile.

That is the second major bolt-on: the Tool-Use Orchestration layer. This is the "agentic" part of the agent. In 2023, a model could write a bash command, but it couldn't run it. Today, the CLI acts as a privileged intermediary. The model says, "I would like to run 'npm test' to see if my changes broke anything," and the CLI says, "Okay, I'll do that for you and feed the error logs back into your next thought process."

It’s the feedback loop. That’s the "aha" moment for me. In the old days, if the AI made a mistake, I had to be the one to find the error, copy the error, and tell the AI, "Hey, you messed up." Now, the CLI sees the exit code of the compiler, sees the stack trace, and just goes, "Oh, I forgot a semicolon," and fixes it before I even realize it happened.

And that changes the nature of the conversation. It moves from "Instruction following" to "Goal seeking." If you give a raw model a repository and say "fix this bug," it will give you a suggestion. If you give an agentic CLI a repository and say "fix this bug," it will attempt a fix, run the tests, see they failed, try a different approach, verify the fix with a linter, and then present you with a finished product. It is the difference between a consultant who gives you a slide deck and a contractor who actually fixes the leak in your roof.

I think people underestimate how much friction there was in that "courier" role. If you're building a CRUD app—create, read, update, delete—you're touching a database schema, a backend controller, a frontend service, and a UI component. That’s at least four files. In 2023, using a raw model, you’re looking at a forty-seven-prompt conversation. You’re saying, "Here is the SQL," then "Okay, now write the Node code based on that SQL," then "Wait, the Node code needs this specific library," then "Okay, now the React part." By the time you get to the React part, the model has forgotten the SQL schema.

And because the model is stateless, every single one of those forty-seven prompts has to carry the "debt" of the previous ones. You end up with these massive, bloated prompts where you're trying to re-explain the whole project every time. Modern CLIs use persistent state management. They keep a running log of what has been changed, what the current "plan" is, and what the state of the file system looks like.

This brings up an interesting point about the "terminal" versus the "GUI." For a few years, we all thought the future was Cursor or VS Code extensions. And those are great! I love a good sidebar. But why are we seeing this massive swing back to the command line specifically? Is it just because developers are nostalgic for black screens and green text?

I don't think so. I think it is about the "Scope of Authority." Inside an IDE like VS Code, the AI is generally restricted to the files you have open or the specific project folder. But the terminal is the "universal interface" of the operating system. From the terminal, an AI can talk to Docker, it can talk to your AWS CLI, it can run git commands, it can grep through logs, it can even trigger a CI/CD pipeline.

It’s the difference between being a passenger in a car—the IDE—and having the keys to the whole garage. If I’m in the terminal, I can tell the AI, "Hey, the staging server is acting weird, go find the logs, figure out which commit caused the spike in latency, and suggest a rollback." You can't really do that from a text editor sidebar without a massive amount of custom integration.

The terminal is effectively the "API for the human computer interaction." By putting the AI there, we are giving it access to every tool we've built for the last forty years. It’s why the "agentic harness" is so transformative. It isn't just "bolting on" a few features; it’s plugging the model into the central nervous system of the machine.

Let's talk about the specific technical hurdles that stopped this from happening in 2023. You mentioned context windows, but what about latency and "tool-calling" capabilities? Because back then, models weren't very good at outputting structured data. You’d ask for JSON and get a conversational "Sure! Here is your JSON..." which would then break the parser.

That was a huge bottleneck. To have a reliable agentic CLI, you need the model to be "native" in tool use. In 2023, we were using "function calling" hacks—regex-ing the output to find things that looked like commands. It was brittle. If the model added an extra space or a bit of polite chatter, the whole "harness" would snap.

"I've updated the file for you! Here is the command: cat > file.txt..." and then the harness just dies because it doesn't know what to do with the "I've updated the file" part.

Right. Now, with models like Claude 3.7 or the latest Gemini models, the "thoughts" are separated from the "actions." The model has a specific mode where it emits a structured call that the CLI can intercept with one hundred percent reliability. This allows for what we call "Chain of Thought" reasoning where the model can say, "I need to check the dependencies first," then it runs the command, sees the output, and then says, "Okay, now I will edit the file." This reliability is what allows us to trust the agent to do repo-wide refactors.

I saw a demo the other day where someone told a CLI agent to "migrate this entire project from Javascript to Typescript." In 2023, that would have been a suicide mission. The raw model would have given you the first three files, gotten bored, and then started repeating itself. But with the harness, the agent actually created a "to-do" list, checked off each file, ran the compiler after each change to ensure it still worked, and fixed the type errors as they popped up.

That "to-do list" is another brilliant bit of the harness: the Planning Layer. Raw models are notoriously "vibes-based." They want to give you an answer immediately. They are like that one friend who starts building the IKEA furniture without looking at the instructions. The agentic harness forces the model to "pause and plan." It often asks the model to generate a multi-step execution plan before it is allowed to touch a single file.

It’s like the CLI is the responsible adult in the room. "No, Herman, you cannot install three hundred new NPM packages until you tell me why you need them."

And it provides a safety rail. One of the most important parts of the modern agentic CLI is the "Permission Prompt." Since the model is now "agentic" and can run commands, it could technically run rm -rf /. The harness intercepts every destructive or outgoing command and asks the human, "Is this okay?" This creates a "Human-in-the-loop" workflow that was impossible with a raw Ollama setup where you were just copy-pasting code you didn't fully understand.

So, to Daniel's point, if we took a modern agentic CLI and stripped away the harness—if we just gave the user the raw model again—how much value is left for a developer? Are we just back to square one?

I would argue you lose eighty to ninety percent of the productivity gain. Without the harness, the model is just a very smart encyclopedia. It can tell you how to write a binary search tree, but it can't tell you why your binary search tree is failing on line forty-two of "utils dot ts." The value is in the integration.

It’s the difference between a brain and a person. A brain in a jar might know everything about surgery, but it can't perform the operation. You need the hands, the eyes, the sterilized environment, and the nurses. The CLI harness provides the "environment" for the model to actually be useful.

And the interesting thing is that we're seeing this "harnessing" happen at the hardware level now too. But staying on the software side, think about "contextual compression." A raw model doesn't know what is important in your repo. It treats a "README" and a "minified library file" with the same level of attention. A good agentic CLI uses the harness to "summarize" the repo. It says, "Okay, I don't need to read the entire node_modules folder, I just need to see the package-lock dot json to understand the versions." That kind of "pre-processing" is what makes the 2026 experience feel like magic compared to 2023.

It’s funny, we spent years trying to make computers easier for humans by adding buttons and icons and windows—the whole GUI revolution. And now, to make computers easier for AI, we're going back to the most basic, text-based interface possible. It turns out the terminal is the "lingua franca" of both the hardcore developer and the advanced AI agent.

It’s because the terminal is "compostable." You can pipe the output of one command into another. You can script it. You can automate it. GUIs are "leaf nodes"—they are the end of the line. You can't easily "pipe" a button click into a text field in another app. But you can pipe a "git diff" into a "llm summarize" command and then pipe that into a "slack post."

"Compostable" is a great word for it, though I think you meant "composable." Unless you're saying my code is trash, Herman Poppleberry, which is a low blow even for a donkey.

Haha! Composable, yes. Though some of the code I've seen AI generate might belong in a compost bin. But that’s actually another point! The harness helps with "code quality" by running linters automatically. In 2023, if the AI gave you bad code, you might not notice until you tried to run it. Now, the CLI harness can be configured to say, "I will not even show this code to the user until it passes a lint check and a basic test."

It’s a self-correcting system. Let’s talk about the "instructional model" versus the "conversational model" distinction Daniel mentioned. He noted that we had instructional models long before we embraced them at the terminal. Why do you think that is? Why didn't we have "Ollama Code" in 2023 that worked as well as Claude Code does now?

I think there were two missing ingredients. One was the "Reliability of Reasoning." Older models were just too "flighty." They would follow instructions for a while and then just drift off. You couldn't trust them to execute a multi-step plan without a human holding their hand every three seconds. The second was the "Tool-Calling Architecture" we talked about. Until the models were specifically trained to "think" in terms of tools, the harness was too hard to build.

It also feels like a "Data" problem. To build a good agentic harness, you need to train the model on how to use a terminal. You need to show it thousands of examples of "I ran this command, it failed with this error, so I tried this other command." In 2023, most of the training data was just "Human asks question, AI gives answer." It wasn't "Human gives goal, AI navigates a file system to achieve it."

That is a huge insight. We've shifted from training "chatbots" to training "operators." And an operator needs a console. That’s why the terminal is the natural habitat for these things. It’s the "cockpit" of the machine.

So, if I'm a developer listening to this, and I've been using the web interface for Claude or ChatGPT, and I haven't made the jump to the CLI yet... what am I actually missing? Is it just speed, or is it a qualitative difference in the kind of work I can do?

It is a qualitative shift in "Cognitive Load." When you use a web interface, you are the "Context Manager." You are the one who has to remember to upload the right files, you are the one who has to copy the code back, you are the one who has to run the tests. You are using your brain power to do "plumbing."

And I'm a sloth, Herman. I don't want to do plumbing. I want to sit in my tree and think big thoughts about the architecture.

The CLI agent takes over the plumbing. It handles the "context management" and the "execution." This frees you up to do the "high-level thinking" that Daniel's sources mentioned. You spend your time reviewing plans, verifying logic, and directing the "intent" of the project, rather than worrying about whether you copied the entire "useEffect" hook correctly.

It’s also about the "exploration" phase. In the terminal, I can say, "Hey, explore this codebase and tell me how the authentication flow works." The agent will then grep through the files, find the middleware, find the database calls, and give me a summary. If I did that in a web UI, I’d have to upload twenty files just to get started.

And you might miss the twenty-first file that actually contains the secret sauce! The "autonomy" of the CLI agent to go looking for information is a massive force multiplier. I've had instances where I asked a CLI tool to fix a bug in a frontend component, and it realized the bug was actually in a shared utility library three folders up that I hadn't even thought to look at. A raw model would never have found that because it only knows what you show it.

It’s the "unknown unknowns." The harness gives the model a way to turn "unknown unknowns" into "known unknowns" by searching, and then into "knowns" by reading.

There is also the "second-order effect" of repository management. Think about things like dependency updates. In the old world, that was a tedious manual chore. Now, you can tell an agentic CLI, "Update all our dependencies, fix any breaking changes in our tests, and give me a summary of what changed." That is a task that spans the entire repo, the terminal, and the internet. It is the ultimate "harness" task.

I wonder, though... does this make us lazier? Or worse, does it make us "lesser" engineers because we aren't "feeling the grain" of the code as much? If the harness is doing all the plumbing, do we eventually forget how the pipes work?

That is the perennial concern with every abstraction, from compilers to high-level languages. But I think it’s the opposite. By removing the "drudgery" of the plumbing, it actually allows you to see the "system" more clearly. You're not focused on the syntax of a "for-loop," you're focused on the data flow of the entire application.

I suppose it’s like the difference between being a stonemason and being an architect. You still need to know how stones work, but you're building a cathedral, not just carving a block. Although, I’ve seen some AI-generated "cathedrals" that look more like a Winchester Mystery House with stairs leading to nowhere.

Which is why the "Harness" needs to include "Verification." One of the things we're seeing in the latest versions of these tools is "Formal Verification" integration. The agent doesn't just write code; it tries to prove the code is correct using automated reasoning tools. This is something that would be impossible to coordinate through a simple chat window.

That is wild. So the CLI isn't just running "npm test," it’s running sophisticated "provers" to make sure the logic is sound. We are getting into some serious "Iron Man" territory here, where the AI is J.A.R.V.I.S. and the terminal is the suit.

And just like the suit, the "harness" is what makes it powerful. Without the suit, Tony Stark is just a guy in a basement. Without the harness, the LLM is just a brain in a server farm.

Let’s look at the "Local vs Cloud" aspect of this. Daniel mentioned Ollama, which is famous for running things locally. Most of the powerful agentic CLIs today, like Claude Code, are using cloud-based models. How does that change the "harness"? Does it make it harder because of the "latency" of sending files back and forth?

It actually makes the "indexing" part even more critical. Since you don't want to send a gigabyte of source code to Anthropic or Google every time you ask a question, the harness has to be very "smart" about what it sends. It uses local embeddings and local search to "prune" the context down to the bare essentials. So the "harness" is actually doing a lot of "Edge Computing"—processing your data locally to make the "Cloud Brain" more efficient.

So it’s a hybrid model. Local "eyes" and "hands," cloud "brain." That seems like a winning combination for privacy too, right? If the harness is smart enough, it can ensure that "secrets" or "API keys" never leave your machine, even if the model is asking for them.

Precisely. A good harness will have "PII filters" and "Secret Detectors" built-in. It will say, "The model is asking to see your dot env file, but I'm going to redact the actual passwords before I send it." You don't get that with a raw API call or a web chat.

Okay, let's pivot to the "Practical Takeaways." If I'm a developer or a team lead, and I'm looking at this landscape, what should I be doing differently today?

First, if you are still using a basic "Chat" interface for coding, stop. Or at least, stop using it for anything more than "How do I do X in Y?" For actual repository work, you need to move to an agentic CLI or a heavily agentic IDE. The productivity gap is now too large to ignore.

It’s like trying to dig a hole with a spoon when there's a backhoe sitting right there. Sure, you can do it, but why would you?

Second, for the people building these tools—and there are a lot of them right now—the lesson is "Don't focus on the model, focus on the harness." The "Model Wars" are a commodity race. The real "moat" is in how well your tool understands the developer's environment. How good is your indexing? How reliable is your tool-calling? How "safe" is your execution environment?

"The Harness is the Moat." I like that. It’s the "User Experience" of the AI. It’s not about how many parameters the model has, it’s about how many "friction points" you’ve removed from the developer's day.

And third, for everyone else, realize that the "Terminal" is no longer just for "old-school linux nerds." It is becoming the "Universal Command Center" for AI. Learning a bit of bash or zsh is going to be more valuable in 2026 than it was in 2016, because that is where the AI "Agents" live.

It’s a great time to be a sloth who knows how to type. I can just sit here, give a high-level command, and watch the "harness" do the heavy lifting. I’ll just be here in the tree, verifying the "architectural intent."

And I'll be here, digging into the next research paper on "Contextual Compression," making sure the donkey-work of the future is as efficient as possible.

Before we wrap up, we should probably mention where the future of this is going. Do you think the "Harness" eventually becomes part of the model itself? Like, will we have "Operating System Models" that don't need a CLI because they are the CLI?

We're already seeing hints of that with "Large Action Models," but I think there will always be a need for a "User-Facing Layer." Even if the model is "native" to the OS, you still want a way to "see" what it’s doing, to "interrupt" it, and to "correct" it. The CLI is the perfect "glass box" for that. It’s transparent in a way that a GUI often isn't.

It’s the "Audit Log" of the AI’s thoughts. I love it. Well, this has been an illuminating look at the "missing links" of AI coding. Thanks for the deep dive, Herman.

Always a pleasure, Corn. And thanks as always to our producer, Hilbert Flumingtop, for keeping the "harness" of this show running smoothly.

And a big thanks to Modal for providing the GPU credits that power this show. They are the "serverless harness" that makes our "AI brain" possible.

If you're enjoying our weird little explorations, a quick review on Apple Podcasts or Spotify helps us reach more curious minds. We really appreciate the support.

This has been My Weird Prompts. You can find us at myweirdprompts dot com for all our episodes and the RSS feed.

See you in the terminal.

Bye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness

Downloads

You Might Also Like

#1754: From Ollama to Agentic CLIs: The Rise of the AI Harness