#1708: Why Your AI Agent Forgets Everything (And How to Fix It)

Learn how Letta's memory-first architecture solves the AI context bottleneck for long-term agents.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-1861
Published: Mar 29
Duration: 24:08
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: ai-agents rag context-window

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The "Memory Problem" in AI Agents

The promise of AI agents has shifted from simple chatbots to long-term teammates. However, a major bottleneck persists: memory. Standard LLMs have limited context windows, forcing agents to "forget" earlier instructions or hallucinate when overloaded. This creates a fragile system where the agent cannot maintain a persistent state across thousands of sessions. The core challenge is moving from one-off interactions to a workflow where the AI remembers your preferences, past projects, and specific nuances from months ago.

The Evolution from MemGPT to Letta

To address this, the industry is seeing a shift toward "memory-first" architectures. A prime example is Letta, which evolved from the MemGPT research project out of UC Berkeley. Originally standing for Memory-GPT, the technology has been rebranded and commercialized as Letta. While MemGPT referred to the underlying engine, Letta represents the full production-ready framework. This distinction is crucial for developers navigating the ecosystem.

How Letta Manages Context Like an OS

Letta treats the LLM context window like RAM in a computer. When the "RAM" gets full, the framework swaps data out to archival storage, much like an operating system manages memory. In this analogy, the LLM acts as the CPU, and Letta serves as the memory manager. Unlike traditional RAG systems where developers manually trigger database searches, Letta agents autonomously decide what to remember and when to retrieve it. They utilize function calling to manage three types of memory: Core Memory (always in context), Archival Memory (vector database for long-term storage), and Recall Memory (full event history).

Real-World Applications and Efficiency

This autonomy allows for sophisticated use cases. In customer support, an agent can remember a specific frustration from weeks ago without a keyword search. In education, an AI tutor can track a student's progress over a full school year, adjusting teaching styles based on past struggles. However, this architecture is more complex than low-code alternatives. Developers must think about state management and how the agent interacts with its own database.

Efficiency is a key driver. Even with massive context windows like a million tokens, latency and cost remain high. Sending a full history for every simple message is inefficient. Letta allows agents to stay "thin" and fast, pulling in heavy memories only when needed. This makes it scalable for daily-use assistants.

The Competitive Landscape and Modular Stacks

Letta exists alongside orchestration tools like CrewAI and LangGraph. While CrewAI excels at multi-agent coordination and LangGraph offers strict state control for workflows, Letta focuses on the individual agent's brain. It is carving a niche for digital twins and complex personal assistants where memory is the primary feature.

The future likely involves modular stacks. A Letta agent could serve as the persistent "Account Manager" with deep client knowledge, triggering a LangGraph workflow for structured report generation. This combination offers both contextual grounding and procedural reliability. While integration adds complexity, the trade-off is worthwhile for high-value enterprise applications where forgetfulness is a deal-breaker.

Mentions

CrewAI Multi-agent orchestration framework
GPT-4 Advanced language model from OpenAI
LangChain Framework for building LLM applications
LangGraph State machine framework for complex workflows
Letta Memory-first AI agent framework
Llama Open-source language model family
MemGPT Original research project and engine behind Letta
OpenAI Assistants API API for building persistent AI assistants

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1708: Why Your AI Agent Forgets Everything (And How to Fix It)

What if an AI agent could remember every single thing you ever told it? I am not talking about the last ten thousand tokens or a messy summary that loses all the nuance. I mean actually remembering your preferences, your past projects, and that specific way you like your code formatted from a conversation three months ago. Today's prompt from Daniel is pushing us right into that territory with a deep dive into Letta, which many of you probably still know as MemGPT.

It is a massive problem in the space right now. We are moving from these simple, one-off chat interactions to actual agentic workflows where the AI is supposed to function like a long-term teammate. But the bottleneck is almost always memory. If the agent hits its context limit and starts "hallucinating" or forgetting the initial instructions, the whole system collapses. By the way, it is pretty cool to note that our script today is being powered by Google Gemini three Flash, which ironically has a massive context window itself, but even a million tokens eventually runs out.

Or, well, I should say, that is the core of the issue. You hit the wall eventually. And before we get into the gears of how Letta fixes this, we should probably clear up the name situation because it is a bit confusing. Is it Letta? Is it MemGPT? Are they two different things fighting for dominance in the same repo?

It is actually a classic evolution from a research project to a production-ready startup. MemGPT started as a research paper out of UC Berkeley. The acronym stood for Memory-GPT, and the whole hook was "LLMs as Operating Systems." But as the team moved toward building a commercial-grade framework, they rebranded to Letta in late twenty twenty-four. So, while you will still see the underlying technology referred to as MemGPT in academic circles or older tutorials, Letta is the official name of the framework and the company behind it. Think of MemGPT as the engine and Letta as the entire car, including the dashboard and the steering wheel.

Okay, so Letta is the name on the building now. And looking at the landscape, we have already talked about CrewAI for orchestration and LangGraph for those complex cyclic workflows. Where does Letta sit on the shelf? Is it trying to replace those, or is it solving a different problem entirely?

It is really a "memory-first" architecture, which is a fundamentally different starting point. If CrewAI is about how a group of people talk to each other to get a job done, and LangGraph is about the flowchart they follow, Letta is about the individual agent's brain and how it manages its own long-term and short-term storage. It treats the LLM context window like RAM in a computer. When the RAM gets full, the operating system swaps data out to the hard drive. Letta does that for AI context.

I love that operating system analogy. It makes it much easier to visualize. On a standard computer, the CPU doesn't care if a file is on the SSD or in the RAM; the OS just makes sure the data is where it needs to be when the CPU asks for it. So, in Letta's world, the LLM is the CPU, and the framework is acting as the memory manager?

Precisely. Well, I mean, that is the most accurate way to look at it. In a typical RAG system—Retrieval-Augmented Generation—the developer has to manually decide when to search a database and what to shove into the prompt. In Letta, the agent has "tools" that allow it to autonomously manage its own memory. It can decide, "Hey, this piece of information about the user's favorite programming language is important, I am going to write this to my archival memory." Or, "I need to look up what we discussed in October regarding the project budget," and it performs that retrieval itself without the human having to trigger a search.

That feels like a huge shift in autonomy. Most agents today are passive recipients of whatever context the developer manages to squeeze into the prompt box. If the agent can decide what is worth remembering, it starts to feel more like an actual entity with a persistent state. But how does it actually decide? Is there a separate logic layer, or is the LLM itself making the call to "Save to Disk"?

It is the LLM making the call via function calling. Letta provides the agent with a specific set of system tools. There is the "Core Memory," which is always in the prompt—things like the agent's persona and basic facts about the user. Then there is "Archival Memory," which is a massive vector database where the agent can store and retrieve documents or past experiences. And finally, there is "Recall Memory," which is the full history of past events and messages. The agent is prompted to realize when its current context is getting full and it needs to offload or search. It is a very active process. Instead of the developer saying "Search the docs," the agent thinks "I don't know the answer to this, let me check my archives."

So, if I am building a customer support agent with Letta, it could technically "remember" that a customer was frustrated three weeks ago about a specific shipping delay, even if that conversation is long gone from the current session?

Yes, and it wouldn't just be because you did a keyword search for "shipping." The agent would have stored that interaction as a meaningful event. When the customer comes back and says "Is it fixed yet?", the Letta agent would have the persistent state to know what "it" refers to. In CrewAI, you might have a "Memory" feature, but it is often just a simple RAG implementation or a short-term buffer. In LangGraph, you have "Checkpoints" which are great for resuming a specific workflow, but Letta is built for the "Forever Agent" that lives across thousands of sessions.

It sounds like Letta is optimized for depth of relationship with a user, whereas LangGraph is optimized for the reliability of a process. I can see why Daniel is interested in this. If you are into automation and high-level tech comms, you want your tools to get smarter the more you use them. But let's talk about the competition. If I am a developer in March of twenty twenty-six, and I am looking at GitHub stars and community momentum, where does Letta stand? Because I have noticed CrewAI and LangGraph seem to have a lot of the oxygen in the room.

You are not wrong. If we look at the numbers, Letta is definitely in the "rising challenger" category rather than the "dominant incumbent." As of right now, Letta's GitHub repository has about eight thousand five hundred stars. Compare that to CrewAI, which is sitting around fifteen thousand, and LangGraph, which is near twelve thousand. CrewAI really captured the imagination of the "low-code" or "quick-start" crowd. People love how easy it is to just define three agents and set them loose. LangGraph captured the enterprise crowd because they want that strict control over the state machine. Letta is carving out a niche for people building "Digital Twins" or complex personal assistants where memory is the primary feature, not a secondary one.

Is there a reason it hasn't exploded quite as fast as CrewAI? Is it harder to use? Or is the "memory problem" just something people haven't realized they have yet?

It is a bit of both. Letta's architecture is more complex to wrap your head around because you have to think about state management and how the agent interacts with its own database. It is not just "Prompt, Response, Done." Also, for a long time, people thought that just having a bigger context window would solve everything. When Gemini announced a million tokens, and then two million, some people thought, "Why do I need a memory framework? I will just put the whole book in the prompt."

And why is that a mistake? I mean, if I can fit a million tokens in, why bother with the complexity of Letta's "Archival Memory"?

Two words: Latency and Cost. Even if you can fit a million tokens in, the "Time To First Token" goes up significantly. Plus, you are paying for those tokens every single time you send a message. If you have a persistent assistant that you talk to every day, sending a million-token history for a simple "Hello" is like hiring a semi-truck to deliver a single envelope. It is inefficient. Letta allows the agent to be "thin" and fast most of the time, only pulling in the "heavy" memories when it actually needs them. It is a much more scalable way to build.

That makes total sense. It is the difference between carrying your entire library on your back versus having a smart librarian who brings you the right book when you ask a question. So, let's look at the actual use cases. Who is actually using Letta in the real world? Are there specific industries where this memory-first approach is beating out the orchestration-first approach of CrewAI?

We are seeing it a lot in personalized education and health coaching. Imagine an AI tutor that tracks a student's progress over an entire school year. It needs to remember that the student struggled with fractions in November so it can adjust its teaching style in March. A standard RAG search might find the "fractions" lesson, but it won't necessarily capture the "vibe" or the specific mistakes the student made. Letta agents can maintain a "User Persona" block in their core memory that evolves over time. I know of a few startups in the legal research space using it too, where an agent needs to maintain context over a multi-month litigation process involving thousands of documents.

I can see a "Digital Twin" use case here too. If you want an agent that actually acts like you, it needs to have a deep, structured memory of your opinions and past decisions. A simple vector search of your emails isn't enough; the agent needs to "learn" and update its internal model of who you are.

That is exactly what the Letta team is pushing for. They recently launched the Letta "Agent Service," which is a managed environment for deploying these persistent agents. It basically gives the agent its own database, its own file storage, and its own identity that persists even if you turn off the server. This is a big differentiator from LangGraph. While LangGraph has "persistence," it is usually tied to a "Thread ID" or a specific task. Letta's persistence is tied to the "Agent ID" itself. The agent becomes a long-lived entity.

You mentioned earlier that Letta and LangGraph don't necessarily have to be enemies. Could you actually use them together? Like, use Letta for the "brain" and the memory, but use LangGraph to define the specific steps of a complex business process?

This is where the industry is heading—modular agentic stacks. You could have a Letta agent acting as the "Account Manager" who knows everything about the client, and when it is time to actually generate a report, that Letta agent triggers a LangGraph workflow to do the heavy lifting. The Letta agent provides the "contextual grounding" and the LangGraph workflow provides the "procedural reliability." It is a powerful combination because you are getting the best of both worlds: the deep memory and the structured execution.

So, why aren't more people doing that? Is it just the "Developer Friction" of learning two different frameworks?

It is the "Integration Tax." Every time you add another framework, you add more points of failure and more latency. But for high-value applications—think enterprise-level wealth management or high-end executive assistants—the tax is worth paying. The alternative is an agent that feels "forgetful," and in those industries, forgetfulness is a deal-breaker.

Let's talk about the "Traction" question Daniel asked. If Letta is at eight thousand five hundred stars and CrewAI is at fifteen thousand, does that mean Letta is "losing"? Or is it just that the "Memory-First" use case is a smaller, more specialized market?

I wouldn't say it is losing. In fact, if you look at the "Quality" of the contributors and the depth of the research, Letta is punching way above its weight. The developers are former Berkeley researchers and people who really understand the underlying LLM architecture. CrewAI had a massive viral moment, and that is great for adoption, but we are starting to see a "Refinement Phase" in twenty twenty-six. People who built quick prototypes in CrewAI last year are now realizing they need more robust state management and better memory handling as they move toward production. That is when they start looking at Letta. It is a more "mature" architectural choice for certain types of persistent applications.

It is like the difference between a popular pop song and a really well-engineered piece of classical music. One gets the radio play, but the other has the structural integrity to last. Although, to stay with that analogy, I guess we're waiting to see if Letta can get a "Radio Edit" that makes it easier for the average dev to jump in.

They are working on it! The Letta CLI and the new dashboard they released are much more user-friendly. They are trying to hide the complexity of the vector database and the memory management so you can just focus on the agent's persona. But at its core, you still have to understand the "Memory Hierarchy." You have to decide what goes in the "Core Memory"—which is the permanent, always-on context—and what gets pushed to the "Archival Memory."

Give me a concrete example of that. If I am building a "Coding Companion" agent using Letta, what lives in the Core Memory versus the Archival Memory?

Great question. In the Core Memory, you would put the user's preferred tech stack—say, Python and FastAPI—and their coding style, like "prefers functional programming over object-oriented." You might also include the current project's primary goal. This is information the agent needs to have "top of mind" for every single response. In the Archival Memory, you would put the entire documentation for the external libraries they are using, plus all the code snippets from past projects they’ve worked on. The agent doesn't need to "know" the entire library by heart, but it needs to know how to go find the specific API call when the user asks for it.

And the "Recall Memory" would be the actual transcript of the current and past conversations?

Right. Recall Memory is the chronological log. If you say, "Wait, what did I say about the database schema ten minutes ago?", the agent looks in the Recall Memory. If you say, "How did we solve that similar bug in the project three months ago?", it looks in the Archival Memory. It is a very structured way of thinking about information. Most other frameworks just dump everything into a "Vector Store" and hope the similarity search finds the right chunk. Letta's approach is much more intentional.

I can see how that leads to fewer hallucinations. If the agent knows exactly "where" a piece of information came from—whether it is a direct user instruction or a document it read—it can cite its sources better and maintain a more consistent personality. But let's look at the downsides. What is the "Letta Headache"? If I am a dev, what is going to make me want to pull my hair out?

The biggest headache is "State Drift." Because the agent is autonomously updating its own memory, it can sometimes "learn" the wrong thing. If it misinterprets a user's comment and writes it into its Core Memory as a permanent fact, it will keep acting on that wrong information until you manually go in and "debug" its brain. It is like an employee who misunderstood an instruction on day one and has been doing it wrong for a month because it became part of their "routine."

That is fascinating. We are moving from "Debugging Code" to "Debugging Memory." Instead of looking for a syntax error, you are looking for a "Cognitive Error" in the agent's internal model of the world. Can you actually edit the agent's memory directly as a developer?

Yes, Letta provides a sort of "Brain Editor" via their API and dashboard. You can see exactly what is in the Core and Archival memories and manually prune things. But that is clearly not scalable if you have ten thousand agents running. So, the challenge for the Letta team—and for anyone using it—is building "Self-Correction" mechanisms where the agent periodically reviews its own memory for contradictions.

"Self-Reflective Memory." That sounds like the next level of this. It reminds me of how humans dream to process and prune memories. Maybe our AI agents need a "Sleep Cycle" where they organize their Letta databases and delete the junk.

You joke, but that is actually a research area! "Consolidation phases" for AI memory. It is a very exciting time to be looking at this stuff. If we compare it back to the other frameworks we've discussed in this series, CrewAI is like a "Flash Mob"—lots of energy, great for a quick performance. LangGraph is like a "Factory Line"—very efficient, very controlled. And Letta is like a "Long-Term Mentor"—it grows with you, it remembers your history, and it builds a deep, persistent context.

That is a great summary. So, for the listeners who are trying to decide which horse to back: if you are building something where the "Relationship" between the user and the AI is the product, Letta is probably your best bet. If you are building a tool to automate a specific business process with a clear start and end point, you are probably better off with LangGraph. And if you just want to get a multi-agent demo running by lunch, stick with CrewAI.

I think that is a very fair assessment. I would also add that Letta is the one to watch if you are a "Technical Purist." If you really care about the architecture of how agents work and you want to be on the cutting edge of the "LLM as OS" movement, Letta is where the most interesting research is happening. They are tackling the hard problems of state and persistence that everyone else is kind of glossing over with bigger context windows.

It feels like Letta is the "Linux" of the agent world. Maybe not the most popular for the average consumer yet, but it is building the foundational layers that everyone else will eventually have to rely on. Before we move to the takeaways, I want to touch on one more thing Daniel mentioned: Who is actually winning? Is there room for all three of these frameworks to survive?

In twenty twenty-six, we are starting to see some consolidation. I think we will end up with a few "Master Frameworks" that incorporate ideas from all of them. LangChain is already trying to do this with LangGraph. I wouldn't be surprised if we see a "Memory-First" module become a standard part of every framework. But Letta's specific implementation of the "Memory Manager" as an autonomous function of the agent is very unique. It is hard to just "bolt that on" to a framework that wasn't built for it. So, I think Letta will continue to be the leader in the "Persistent Agent" category for a while.

It is also worth noting that Letta is open-source. That is a huge factor for developers who don't want to be locked into a proprietary ecosystem like OpenAI's "Assistants API." If you use Letta, you own the database, you own the memory, and you can swap the LLM backend whenever you want. You could move from GPT-4 to a local Llama model and your agent's "soul"—its memory—would stay intact.

That "Portability of Experience" is a massive selling point. If you spend three months "training" an agent on your personal workflow, you don't want that data trapped inside a single provider's black box. Letta gives you the "Memory Files" that you can take with you. It is the AI equivalent of having your own "Save Game" file instead of just playing on a server that could get wiped at any time.

Alright, let's wrap this up with some practical takeaways for the folks listening. If you are a developer or a tech-savvy manager, what should you actually do with this information?

First, I would say: Stop waiting for "Infinite Context" to solve your problems. Even if it arrives, it won't be as efficient or as structured as a dedicated memory framework. If your project requires an agent to remember things across more than three or four sessions, go download the Letta quickstart today. Just spend an afternoon building a simple persistent assistant and see the difference in how it feels when it references a conversation from yesterday.

Second, think about your "Memory Hierarchy." Even if you don't use Letta, the mental model they provide is incredibly useful. What are the "Immutable Facts" your agent needs? What is the "Searchable History"? And what is the "Short-Term Working Memory"? Categorizing your data this way will make your prompts much more effective, regardless of what framework you use.

And third, look for integration opportunities. If you are already deep into the LangChain or CrewAI ecosystems, don't feel like you have to switch teams. Look at how you can use Letta as a "State Server" for your existing agents. The Letta team has been very open about making their framework play nice with others. You can use their "Agent Service" via a REST API, which means your CrewAI agents could technically "call" a Letta agent to retrieve a long-term memory.

I love that. It is all about the "Agentic Ecosystem." We are moving away from the "One App to Rule Them All" and toward a world where different specialized agents talk to each other. And honestly, having a "Memory Specialist" like Letta in that mix seems almost essential if we want these things to be actually useful in the long run.

It really is. Without persistent memory, AI is just a very smart person with amnesia. You can have the highest IQ in the world, but if you forget who you are talking to every five minutes, you are not going to be a very good partner. Letta is basically giving the AI a notepad and a filing cabinet and teaching it how to use them.

Well, I for one am looking forward to the day my AI agent remembers that I hate being bothered before my first cup of coffee. If Letta can solve that, it has my vote.

We might be a few years away from that level of emotional intelligence, but the structural foundations are being laid right now. It is a very cool time to be watching this space.

Definitely. We should probably stop there before we start talking about AI mid-life crises and repressed memories. This has been a great look at a framework that I think a lot of people are going to be hearing much more about in the coming months.

Agreed. It is the "Silent Revolution" of state management. Not as flashy as a new model release, but probably more important for the actual utility of the tech.

Big thanks to our producer Hilbert Flumingtop for keeping us on track and making sure we don't wander off into the weeds too much. And of course, a huge thanks to Modal for providing the GPU credits that power this show and allow us to dive deep into these technical topics.

If you found this dive into Letta useful, we would love it if you could leave us a review on Apple Podcasts or Spotify. It genuinely helps other people find the show and keeps us motivated to keep digging into these weird prompts from Daniel.

This has been My Weird Prompts. You can find us at myweirdprompts dot com for the full archive, RSS feeds, and all that good stuff.

Catch you next time.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1708: Why Your AI Agent Forgets Everything (And How to Fix It)

Mentions

Downloads

You Might Also Like

#1708: Why Your AI Agent Forgets Everything (And How to Fix It)