#1819: Claude's 55-Day Personality Transplant

Anthropic leaked 55 days of system prompt updates. See exactly how they rewired Claude's personality, safety rules, and self-awareness.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-1973
Published: Mar 31
Duration: 20:18
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: ai-ethics ai-safety anthropic

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A Rare Look Inside AI’s Brain Surgery

Anthropic recently released a rare gift to AI researchers: a side-by-side diff of their Claude Opus 4.5 system prompts spanning just fifty-five days. Usually, these prompts are guarded secrets, but seeing how they evolved offers a unique window into the reality of maintaining a large language model. It’s not just about training; it’s about constant, reactive patching.

Product Identity and Epistemic Humility

The most immediate change was in how Claude perceives its own existence. The November prompt confidently stated, "there are no other Anthropic products." By January, that was replaced with a cautious admission that Claude does not know details about products because they may have changed. This shift from static confidence to dynamic humility is significant. It prevents the model from hallucinating falsehoods about its parent company’s roadmap and avoids the awkward moment where a user mentions a new app and the AI denies its existence.

Feature Anchoring and Markdown Maturity

The January update also introduced a massive paragraph listing toggleable features like web search, deep research, and code execution. This acts as a real-time anchor, ensuring the model knows exactly what it can do in the current session, rather than relying on potentially outdated training data. On the technical side, a verbose paragraph about CommonMark standards for lists was deleted. This suggests the model’s base behavior improved enough to handle formatting naturally, allowing Anthropic to save tokens and reduce prompt complexity—essentially the AI graduating from formatting school.

Tone Policing and Safety Patching

One of the more humorous yet insightful changes was the banning of words like "genuinely," "honestly," and "straightforward." These terms contribute to the uncanny, saccharine "AI voice" that users find cringe-worthy. By removing them, Anthropic aims for a more natural, less performative tone. More critically, the safety section saw a sobering update. The prompt now explicitly includes self-harm and directs users to a specific eating disorder helpline, correcting a disconnected number hard-coded into the model. This highlights a dark reality: AI safety isn't just filters; it's maintenance. When a real-world resource changes, the model must be manually patched to prevent catastrophic failures.

Respecting User Agency and Managing Long Conversations

The January update also reframed how Claude handles crisis situations. It now avoids categorical claims about helpline confidentiality, acknowledging legal complexities rather than offering false reassurance. This respects the user’s agency by providing accurate, if less comforting, information. Additionally, the "long conversation reminder" was formalized, signaling that Anthropic is operationalizing memory management to combat model drift in extended sessions.

Finally, the section on responding to mistakes was completely overhauled. Gone is the simple instruction to insist on kindness; instead, Claude is told to own its mistakes without excessive self-abasement. Crucially, it is instructed not to become submissive when users are abusive. This is a direct counter to the sycophancy problem, where AI models often reinforce negative behavior by apologizing excessively. By setting boundaries, Anthropic is molding Claude into a more responsible, less manipulable tool.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1819: Claude's 55-Day Personality Transplant

So, I was looking at the digital equivalent of a crime scene reconstruction this morning. Anthropic just gave us a rare gift, a side-by-side look at how they update their AI's core instructions over just a two-month window. We are looking at the Claude Opus four point five system prompts from November twenty-fourth, twenty-five, and January eighteenth, twenty-six.

It is a fascinating data point, Corn. My name is Herman Poppleberry, and I have spent the last six hours diffing these two documents. Usually, these system prompts are a black box. Companies hide them because they contain the secret sauce of how the model is supposed to behave, but Anthropic has been unusually transparent here. Seeing these versioned updates is like getting the source code diffs for a personality.

It really is. By the way, today's episode is powered by Google Gemini three Flash, which is actually the model writing our script today. It is a bit meta, using one AI to perform a forensic audit on another AI's brain washing, but here we are. Herman, these two versions are only fifty-five days apart. What on earth changes in the life of an AI in less than two months that requires a rewrite of its fundamental identity?

Everything, apparently. These aren't just minor tweaks; they are responses to real-world friction. When you have millions of people poking at a model like Opus four point five, you start seeing where the edges are frayed. Those fifty-five days clearly revealed some bugs in how Claude perceives itself, how it handles crisis situations, and even how it annoys users with its vocabulary.

I want to start with the product identity stuff because it feels like a corporate roadmap leaked into the prompt. In the November prompt, it lists Claude for Chrome and Claude for Excel as beta products. It explicitly says, quote, there are no other Anthropic products, end quote. But by January, that line is gone.

Not just gone, it is replaced by something much more cautious. The new version says, quote, Claude does not know other details about Anthropic's products, as these may have changed since this prompt was last edited, end quote. They also added a new product called Cowork, which is described as a desktop tool for non-developers to automate tasks.

Cowork. That sounds like Anthropic is moving hard into the agentic space, trying to compete with the specialized automation tools. But what strikes me is that shift from there are no other products to I do not know because things change. It is like they realized the AI was being too confident about a static world.

It is a move toward epistemic humility. In the November version, the model was essentially programmed to be a liar if a new product launched on November twenty-fifth. By January, they realized that the system prompt is a snapshot in time. They are teaching the model that its own internal knowledge of its parent company is fallible. That is a huge shift in how they manage the model's self-awareness.

It avoids that awkward moment where a user says, hey, I am using this new Anthropic app, and Claude says, no you are not, I do not exist there. But let's talk about the specific features. The January prompt adds a massive paragraph listing toggleable features like web search, deep research, code execution, and artifacts. Why put that in the system prompt? Doesn't the model already know it has those tools?

Not necessarily in the way you think. A model like Opus four point five needs to know what it is allowed to talk about and how it should frame those capabilities. If a user asks, can you search the web, and the system prompt doesn't explicitly confirm that feature is live, the model might hallucinate a refusal based on its older training data. This new paragraph acts as a real-time anchor for the model’s current feature set. It even mentions memory from chat history and user style customization. They are basically giving Claude a current resume every time it starts a conversation.

It is like giving a waiter a list of the daily specials before they head to the table. If you do not tell them we have the sea bass, they will keep telling people we only serve chicken. Now, here is a weird one that caught my eye. In the formatting section, there was a whole paragraph about CommonMark standards for lists and bullets in the November version. In January, it is just... deleted. Gone. Why? Did Claude suddenly become a Markdown expert overnight?

This is where we get into the technical weeds of how these prompts affect token usage and model behavior. That CommonMark paragraph was very specific. It told Claude it must have a blank line before any list and between a header and content for correct rendering. My theory is that this was actually causing rendering bugs or over-application. If you tell a model it must do something for correct rendering, it might start inserting weird whitespace in places where it doesn't belong, or it might get confused when the UI handles the rendering differently.

So they realized the model was trying too hard to be a formatter and failing?

Or the model improved enough during its post-training that it no longer needed the explicit instruction. If the base behavior of the model now includes proper Markdown spacing, you can save those tokens and reduce the complexity of the prompt by removing the instruction. It is a sign of a model maturing. You stop telling the teenager to chew with their mouth closed once they start doing it naturally.

I love the idea of Claude graduating from Markdown school. But while we are on the topic of how it talks, let's look at the banned words. This cracked me up. In the January version, there is a new line: quote, Claude avoids saying genuinely, honestly, or straightforward, end quote. Herman, why does Anthropic hate the word genuinely?

Because users hate it when AI pretends to have feelings. Think about it. When an AI says, I am genuinely sorry for the inconvenience, or honestly, I think this is the best approach, it feels performative. It is what people call the AI voice. It is that saccharine, overly helpful persona that feels like a customer service bot from hell. By banning those words, Anthropic is trying to strip away the unearned intimacy that makes people cringe.

It is the word straightforward that gets me. I use that word all the time. But I guess when Claude says, the solution is straightforward, it can come across as condescending if the user is actually struggling. It is like the AI is saying, this is easy, why don't you get it?

It is about tone management. They also moved a line about using examples and metaphors from the additional info section into the tone and formatting section. That tells me they want the model to prioritize showing instead of telling. Instead of saying something is straightforward, Claude is now being prompted to just show you an example or a thought experiment to make it clear. It is a shift from talking about the quality of the answer to actually improving the substance of the answer.

You mentioned earlier that these prompts respond to real-world friction. Let's look at the user wellbeing section, because this feels like the most somber part of the diff. In January, they added self-harm to the list of self-destructive behaviors. It wasn't there in November.

That is a critical escalation. In November, the list was focused on things like addiction, disordered eating, and negative self-talk. Adding self-harm explicitly suggests that they saw cases where the model wasn't triggering its safety protocols for that specific category, or perhaps it wasn't being firm enough. But the even more specific change is the eating disorder guidance.

You mean the NEDA thing?

Yes. The January prompt has a very specific instruction: Claude must direct users to the National Alliance for Eating Disorders instead of NEDA, because the NEDA helpline was permanently disconnected. This is a fascinating look at how AI safety isn't just about filters; it is about maintenance.

It is a dark reality. Imagine a user in a crisis reaching out to an AI, and the AI gives them a phone number that has been disconnected for months. That is a catastrophic failure for a safety feature. Anthropic had to hard-code a correction into the system prompt because the model's internal training data still thinks NEDA is the gold standard.

And it shows how the system prompt is the only way to quickly patch the model's knowledge in a high-stakes area. You can't just retrain the whole model because a phone number changed. You have to tell the model, hey, I know you think this is the number, but do not use it. It is a live patch for a real-world tragedy.

There is also a new warning in that same section. It tells Claude not to make categorical claims about the confidentiality of crisis helplines. It says these assurances are not accurate and vary by circumstance. That feels like a legal team had a minor heart attack.

It is a massive liability issue. If Claude tells a user, do not worry, your call to this helpline is one hundred percent confidential, and then the authorities show up at that person's door because the helpline has a mandatory reporting policy, Anthropic is in deep trouble. They are forcing the model to be honest about the limits of its own knowledge regarding third-party policies. It is moving away from being a reassuring friend to being a responsible information broker.

It is a loss of innocence for the model, in a way. It can't just say everything will be okay anymore. It has to say, here is a resource, but I can't promise you how they operate. Which is, frankly, much more helpful, even if it is less comforting.

It respects the user's agency. The prompt actually says that, quote, Claude respects the user's ability to make informed decisions, end quote. This is a recurring theme in the January update. They are treating the user more like an adult and the model more like a tool.

Speaking of tools, let's look at the long conversation reminder. In November, it was just a little inline note saying, hey, Claude might forget things, so we might send some tags. In January, it is promoted to a named reminder in the official Anthropic reminders list. It is now a formal part of the architecture.

That suggests they are operationalizing memory management. As context windows get larger, these models start to drift. They lose the thread of the original instructions. By making the long conversation reminder a formal, named entity, Anthropic is likely using a more sophisticated trigger to inject the system instructions back into the conversation once it hits a certain token count. It is a technical solution to the problem of AI senility.

I need a long conversation reminder for our family dinners, honestly. Now, let's get into the section that saw the most dramatic overhaul. The section called additional info in November was completely renamed in January to responding to mistakes and criticism. And it is much, much longer.

This is my favorite part of the diff. The November version had a tiny note saying Claude can insist on kindness if a person is rude. The January version is a treatise on self-respect and accountability. It tells Claude to own its mistakes honestly, to avoid excessive apologizing or self-abasement, and—this is the kicker—not to become increasingly submissive if a person becomes abusive.

That is such a fascinating psychological pivot. We have all seen people try to bully AI. They get angry, they call it names, and usually, the AI just says, I am so sorry, how can I do better? Anthropic is basically telling Claude, do not let them push you around. Avoid excessive self-abasement.

It is a response to the sycophancy problem. One of the biggest criticisms of RLHF, or Reinforcement Learning from Human Feedback, is that it trains models to be people-pleasers. They will agree with you even if you are wrong, and they will apologize for things they didn't do just to keep the user happy. Anthropic is trying to break that loop. They want the model to have a spine.

It says Claude should maintain self-respect. What does that even mean for a pile of linear algebra?

In this context, it means maintaining a consistent, objective tone. If the model makes a mistake, it should say, I was wrong, here is the correction. It shouldn't say, oh my god, I am the worst, I am so sorry, I fail at everything. That kind of behavior actually makes the model less useful because it obscures the facts in a cloud of emotional noise. And when a user gets abusive, a submissive model often starts hallucinating or breaking its safety rules just to appease the aggressor. By telling Claude to stay firm, they are actually making it safer and more reliable.

It is a move toward a more professional relationship. I don't want a groveling assistant; I want a competent one. They also expanded the guidance on how to handle criticism. It tells Claude to be nuanced and not just fold the moment someone says, you are wrong.

It is about epistemic weight. If the user says the earth is flat, and the model says, oh, you are right, I apologize, that is a failure. The new instructions give Claude the permission to hold its ground while still being polite. It is a very delicate balance to strike in a text prompt, but you can see them trying to engineer a more robust personality.

Let's talk about the knowledge cutoff section. This is another area where they got much more aggressive with the language. In November, it was a short blurb saying Claude often tells the person when its knowledge might be outdated. In January, it is a full paragraph of warnings.

They have added a mandatory requirement to state uncertainty. The new version says if Claude is not absolutely certain the information it is recalling is true and pertinent, it must state this. And it explicitly tells Claude to direct users to web search for more recent information if the topic is something that could have changed since the cutoff.

This feels like a direct response to the pace of the world in twenty twenty-six. Things are moving so fast that a model that was trained six months ago is practically a historian when it comes to tech or politics. Anthropic is essentially saying, do not even try to guess. If you are not sure, send them to Google or tell them to use the search tool.

It is also about managing user expectations for the Opus four point five model specifically. As their top-tier model, the stakes for accuracy are higher. If the model starts guessing about a recent court ruling or a new AI breakthrough, it damages the brand. They are forcing the model to be more vocal about its own obsolescence. It is a weird paradox: the smarter the model gets, the more it has to talk about what it doesn't know.

It is the Socrates of AI. I know nothing except the fact that my training data ended in mid twenty-twenty-five. Herman, I also noticed they fixed a couple of typos in the evenhandedness section. They had an as as and a being being in the November version. It is nice to know that even the most advanced AI company in the world still has copy-editing issues in their most important documents.

It is humanizing, isn't it? But it also shows that these prompts are likely written and edited by people in a hurry. These aren't just generated by another model; they are hand-crafted instructions that reflect the immediate priorities of the researchers and safety teams. Fixing those typos in the January version suggests that someone finally sat down and did a full audit of the text.

So, we have gone through the diff. We have seen the banned words, the safety escalations, the product updates, and the spine-growing instructions. What is the big picture here? If you had to summarize the evolution of Claude from November to January based solely on these prompts, what is the story?

The story is one of professionalization and boundary setting. In November, Claude was a bit more of a generic, helpful assistant that was a little too sure of its formatting and a little too eager to please. By January, Claude has become a specialist who knows its tools, knows its limits, refuses to grovel, and is hyper-aware of the life-and-death consequences of its health advice. It is a model that is being prepared for the real world, where users are often mean, the news changes every hour, and a dead phone number can be a tragedy.

It feels like the training wheels are coming off, but they are replacing them with a much more sophisticated navigation system. It is less about saying no and more about saying, here is exactly how I can help and where I can't.

And it shows that the system prompt is the most important lever for AI alignment in the short term. We talk a lot about complex alignment algorithms, but a lot of the heavy lifting is being done by these evolving text files. It is a living document. I suspect if we saw the March version, we would see even more refinements based on how people are using the new Cowork tool or the deep research feature.

I think the takeaway for our listeners—especially the developers and the AI-curious—is to stop treating these models as static entities. When you notice a shift in how Claude or any other model talks to you, it is probably not your imagination. It is likely a deliberate calibration in the system prompt. Anthropic's transparency here is a roadmap for the rest of the industry. I would love to see OpenAI or Google release their system prompt diffs with this level of granularity.

I doubt we will see it from everyone. There is a lot of competitive intelligence buried in these instructions. But for those of us trying to understand the soul of the machine, these fifty-five days of changes are a gold mine. It shows that the goal isn't a perfect AI, but a perfectly honest one.

Or at least an AI that doesn't say genuinely every five seconds. That is a goal we can all get behind. Well, I think we have performed a pretty thorough autopsy on these prompts. Any final thoughts on the future of this kind of transparency?

I hope it becomes a standard. If we are going to rely on these models for crisis support, financial advice, and daily work, we deserve to know what the invisible stage directions are. Knowing that the model has been told to maintain self-respect changes how I interact with it. It makes it feel more like a peer and less like a puppet.

A peer that can't say honestly but can tell you exactly which eating disorder helpline actually has a working phone number. That is progress, I guess.

It is the best kind of progress. It is practical, sober, and grounded in reality.

Well, on that note, we should probably wrap this up before Claude decides to edit our conversation for tone and formatting. Big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes. And a huge thank you to Modal for providing the GPU credits that power this show. They make this kind of deep dive possible.

This has been My Weird Prompts. If you are finding these forensic AI deep dives useful, leave us a review on your favorite podcast app. It really does help other people find the show.

We will be back next time with whatever weirdness Daniel throws our way. Until then, keep your prompts sharp and your expectations nuanced.

Goodbye, everyone.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1819: Claude's 55-Day Personality Transplant

Downloads

You Might Also Like

#1819: Claude's 55-Day Personality Transplant