#1604: The $3 Billion Stealth Giant: AI21 Labs & Nvidia

Why is Nvidia eyeing a $3B deal for AI21 Labs? Discover the tech behind the "OpenAI of Israel" and their revolutionary hybrid architecture.

0:000:00

Episode Details

Published: Mar 27
Duration: 20:30
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: large-language-models state-space-models transformers

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The tech world was recently shaken by reports that Nvidia is in late-stage talks to acquire AI21 Labs for approximately $3 billion. Often referred to as the "OpenAI of Israel," AI21 Labs has long operated in the shadow of its San Francisco rivals, yet its technical contributions and heavyweight founding team make it one of the most significant players in the generative AI landscape.

The Quiet Professional of AI

Founded in 2017, AI21 Labs was a pioneer in the large language model (LLM) space long before "ChatGPT" became a household name. While other companies focused on viral consumer interfaces, AI21 focused on the "plumbing"—the underlying infrastructure required for enterprise-grade applications. Their leadership includes Stanford legend Yoav Shoham and Mobileye founder Amnon Shashua, signaling a company built on deep institutional knowledge rather than just venture-backed hype.

Their strategy has consistently prioritized utility over virality. This is best seen in Wordtune, their flagship consumer product. Unlike general-purpose chatbots, Wordtune is a specialized writing assistant designed for professional refinement. This B2B focus has allowed them to become the reliable backbone for major platforms like Wix and Capgemini.

Solving the Transformer Bottleneck

The most compelling reason for Nvidia’s interest lies in AI21’s architectural innovation. Traditional transformer models suffer from a "quadratic scaling" problem: as the sequence of text grows longer, the computational cost and memory requirements explode. This makes processing massive documents—such as legal archives or long-form research—prohibitively expensive and slow.

AI21 addressed this with Jamba, the first production-grade hybrid Mamba-Transformer model. By integrating Structured State Space Models (SSMs), Jamba achieves linear scaling. This allows for a massive 256,000-token context window (roughly 800 pages of text) while maintaining the deep reasoning capabilities of a transformer. The result is a model that is significantly faster and more memory-efficient than its pure-transformer counterparts.

Trustworthy AI and the Maestro Layer

Beyond raw architecture, AI21 has focused heavily on "trustworthy AI." Their "Maestro" reasoning layer serves as a supervisor for LLMs, specifically designed to reduce hallucinations in Retrieval-Augmented Generation (RAG) workflows. By breaking down queries and verifying retrieved data through a multi-step process, Maestro offers the level of reliability required by legal and medical professionals who cannot afford the factual errors common in standard models.

The Future of Independent Labs

The potential acquisition by Nvidia highlights a growing trend in the industry. As the capital requirements for training frontier models reach astronomical levels, even the most successful independent labs are finding it difficult to remain standalone. The high cost of specialized hardware, like Nvidia’s Blackwell chips, creates a massive barrier to entry.

If the deal closes, it marks a shift toward consolidation where hardware giants acquire top-tier research talent to secure their ecosystems. For AI21, it may be the end of their journey as an independent "stealth giant," but it ensures their hybrid architecture will have the resources to power the next generation of enterprise AI.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1604: The $3 Billion Stealth Giant: AI21 Labs & Nvidia

Daniel's Prompt

Custom topic: AI21 Labs - a surprisingly obscure Israeli AI company despite being one of the earliest serious LLM players. What's special about Jurassic and Jamba models? They pioneered the Mamba architecture hybri

You know, it is not often that a three billion dollar acquisition makes the rounds in the tech press and the general public responds with a collective, who? But that is exactly what happened on March fifth, twenty-twenty-six, when reports surfaced that Nvidia is in late stage talks to buy AI twenty-one Labs.

It is a fascinating situation, Corn. For anyone who has been deep in the large language model space since the early days, AI twenty-one Labs is anything but a mystery. They were actually one of the very first credible challengers to the original GPT-three. But today’s prompt from Daniel asks us to look at why this Israeli powerhouse is still operating in the shadows of the San Francisco giants, even as they are potentially being swallowed by the biggest chipmaker on the planet.

It is a classic case of the quiet professional versus the loud disruptor. Daniel wants us to dig into the history, the tech, and the rather strange obscurity of what people call the OpenAI of Israel. And honestly, Herman, I think you have been waiting for an excuse to talk about state space models for six months, so this is your lucky day.

I will try to keep the excitement contained, though it is difficult when you look at what they have built. To understand why Nvidia is willing to drop billions on them, you have to look at the founders. This is not a couple of college dropouts in a garage. You have Yoav Shoham, who is a Stanford professor emeritus and a legend in the field of artificial intelligence. Then there is Ori Goshen, who comes out of the elite unit eight-two-hundred in the Israeli Defense Forces. And of course, Amnon Shashua, the guy who founded Mobileye and sold it to Intel for fifteen billion dollars.

That is a heavy hitting lineup. It is basically the Avengers of Israeli tech. Usually, when you have that much institutional knowledge and capital behind a project, it becomes a household name. But AI twenty-one chose a different path. While OpenAI was building a viral chatbot to show the world what AI could do, these guys were in Tel Aviv building for the enterprise. They were focused on the plumbing while everyone else was looking at the paint job.

They founded the lab in twenty-seventeen, which is ancient history in generative AI terms. They were thinking about large language models before most people even knew what a transformer was. When they released Jurassic-one back in twenty-twenty-one, it was a massive achievement. It had one hundred seventy-eight billion parameters. For context, that was slightly larger than GPT-three at the time. They also introduced a vocabulary of two hundred fifty thousand tokens, which was much larger than what anyone else was doing. It allowed the model to represent complex concepts more efficiently.

And yet, most people still think the history of LLMs starts with the launch of ChatGPT in late twenty-twenty-two. It is like AI twenty-one built the engine, but forgot to build the shiny car around it. Or maybe they just did not want to. They seemed content to let the hype cycle pass them by while they focused on what they call trustworthy AI.

It was a deliberate strategic choice. They focused on utility over virality. If you look at their early consumer product, Wordtune, it is not a general purpose chatbot. It is a writing assistant. It is designed to help you rephrase, shorten, or change the tone of your writing. It is a tool for professionals, not a plaything for the internet. That focus on the B-to-B side is a recurring theme with them. They wanted to be the infrastructure, the reliable backbone for companies like Wix and Capgemini.

Which brings us to the tech that actually makes them unique right now. We have talked about transformers on this show until we are blue in the face, but AI twenty-one is doing something different with their Jamba model. They are moving away from the pure transformer architecture that everyone else is obsessed with. Herman, give us the deep dive. Why are they messing with the recipe that everyone else says is perfect?

This is where it gets technically brilliant. Jamba is the first production grade model to use a hybrid Mamba-Transformer architecture. To understand why that matters, you have to understand the bottleneck of the transformer. In a standard transformer, the computational cost and the memory usage grow quadratically as the sequence length increases. If you want a massive context window, like two hundred thousand tokens, a pure transformer becomes incredibly slow and expensive because of the way it handles the key-value cache, or the KV cache.

So it basically chokes on its own memory once the document gets too long. It is like trying to remember every single word of a book while you are still reading the last chapter.

The KV cache grows with every new token, and eventually, you run out of VRAM on your GPUs. Jamba solves this by integrating Structured State Space Models, or SSMs, specifically the Mamba architecture. Mamba models have linear scaling. This means the computational cost grows at a steady rate regardless of how long the sequence is. It is more like a recurrent neural network where you have a hidden state that updates, rather than a giant attention matrix that looks at everything at once.

But if Mamba is so much better at scaling, why not just use Mamba for everything? Why keep the transformer layers at all?

Because transformers are still the kings of high-quality reasoning and complex attention. Pure SSMs sometimes struggle with the kind of deep semantic relationships that transformers nail. By interleaving Mamba layers with traditional transformer layers, AI twenty-one gets the best of both worlds. You get the reasoning of a transformer, but you get the efficiency and massive context window of an SSM. We are talking about a two hundred fifty-six thousand token context window. That is roughly eight hundred pages of text.

I can barely finish an eight hundred page book in a month, and this model can digest it in seconds. I am thinking about a legal firm or a medical research group. Being able to drop an entire archive of documents into a single prompt without the model slowing to a crawl is a massive advantage.

And it does it with two point five times faster inference than a pure transformer of a similar size. When you are an enterprise client, that speed translates directly into lower costs and better user experiences. In March twenty-twenty-six, we are seeing the market move toward these hybrid architectures because the old way of just throwing more GPUs at the quadratic scaling problem is becoming unsustainable. It is the end of the monolithic architecture era.

It is interesting you mention the cost, because I was looking at some of the recent benchmark data from LLM-Stats. They compared Jamba one point five Mini to Google’s new Gemma three twelve-B model. Jamba is still very impressive, but it is not the cheapest kid on the block anymore. Jamba’s inference cost is about twenty cents per million tokens, whereas Gemma three is coming in at five cents.

That is the reality of the arms race we are in. Google has the advantage of vertical integration and massive scale. They can subsidize their models in a way a smaller lab can't. But AI twenty-one’s edge is not just the raw cost per token. It is the orchestration. They launched something called Maestro about a year ago, in March twenty-twenty-five. It is essentially an AI reasoning layer that sits on top of models. It can even be applied to other models like GPT-four.

I remember reading about that. They claimed it could reduce hallucinations by up to fifty percent. That is a bold claim in an industry where everyone is still struggling with models confidently lying to them. How does it actually work? Is it just a fancy prompt?

No, it is much more sophisticated. Maestro works by using a multi-step reasoning process. Instead of just generating the next token, Maestro orchestrates a series of checks and balances. It is designed specifically for RAG, or Retrieval-Augmented Generation. When a company wants to use AI to search their private internal documents, they cannot afford a fifty percent chance of the model making up a legal clause or a technical spec. Maestro breaks the query down, verifies the retrieved data, and then synthesizes the answer. It is the supervisor for the LLM.

It sounds like they are the boring, reliable sedan of the AI world, while everyone else is trying to build a rocket ship that might explode on the pad. But being the safe choice has not necessarily helped their public profile. Why has the average person never heard of them? Is it just because they are in Tel Aviv?

Part of it is geographic. Being based in Israel gives them access to incredible talent, especially from units like eight-two-hundred, but they are outside the Silicon Valley echo chamber. When OpenAI or Anthropic announces a new feature, it is a global event. When AI twenty-one announces a breakthrough in hybrid architecture, it is discussed in research papers and enterprise boardrooms. They do not have a charismatic frontman like Sam Altman doing a world tour. They have serious researchers building serious tools.

There is also the Wordtune factor. Wordtune has over ten million users, which is huge, but it does not have the cultural footprint of a chatbot. You do not talk to Wordtune. You use it to fix your email. It is a utility, and utilities are often invisible until they stop working. It is the difference between a tool and a personality.

I think the biggest factor in their obscurity, though, was their refusal to chase the consumer chatbot hype. They could have built a ChatGPT clone years ago. They had the models. But they stayed focused on the B-to-B side. They wanted to be the infrastructure, not the interface. The problem with being the infrastructure is that the people using the interface often do not know you exist. It is the Intel Inside strategy, but without the catchy jingle.

Let’s talk about the money, because this is where things get a bit messy. There was all this hype about a three hundred million dollar Series D round in May of twenty-twenty-five. It was supposed to be backed by Google and Nvidia. But reports coming out now, in early twenty-twenty-six, indicate that the round never actually closed.

That was a major revelation. Their last confirmed funding was actually the two hundred eight million dollar Series C back in late twenty-twenty-three, which valued them at one point four billion. If the Series D really fell through, it explains why they are suddenly in acquisition talks with Nvidia. The capital requirements for training these models are astronomical. Even if you are efficient with your hybrid architectures, you still need tens of thousands of H-one-hundreds or Blackwell chips.

So the Nvidia deal might be more of an acquihire than a standard acquisition. If Nvidia is looking to bring in two hundred of the world’s top AI PhDs, that is a massive win for them. But for AI twenty-one, it might be a bit of a bittersweet ending to their dream of being an independent giant. If the valuation is between two and three billion, that is a decent exit, but it is not the ten or twenty billion dollar valuation people were whispering about a year ago.

It is a common pattern in the industry right now. We saw it with Inflection and Microsoft. We saw it with Adept and Amazon. The cost of staying at the frontier is so high that even the most brilliant teams are finding it hard to stay independent. Nvidia needs a top-tier software and research arm to keep people locked into the CUDA ecosystem. AI twenty-one is the perfect fit for that. They understand the hardware-software co-optimization better than almost anyone because of their work on making models run faster on less memory.

It is also a massive win for Israel’s tech sector. Even if it is an acquihire, a two to three billion dollar exit is a huge signal of the strength of the ecosystem there. It shows that you can build world-class foundational models outside of the United States. It reminds me of what we discussed in episode one thousand one, about that invisible history of AI. There have been these marathons going on in the background for forty years that the public only notices when there is a sudden sprint at the end. AI twenty-one has been running that marathon since twenty-seventeen.

And they have contributed so much to the research community. They have been very open about their findings. Even Jamba was released with open weights for the small version. They have always balanced their commercial interests with a genuine commitment to advancing the field. If they do become part of Nvidia, I hope that research culture survives. Having an independent AI powerhouse in Israel was a matter of strategic importance. If they are absorbed by a US titan like Nvidia, it changes the geopolitical landscape of AI sovereignty.

I wonder if their obscurity was actually a tactical advantage for a while. They could iterate and experiment with things like Mamba without the intense public scrutiny that OpenAI faces every time they change a comma in their system prompt. They could fail quietly and succeed quietly.

I suspect you are right. When you are the underdog, you can afford to be weird. You can try a hybrid architecture that everyone else thinks is too risky. You can focus on niche enterprise features that do not make for good headlines but solve real problems. The tragedy is that the same obscurity that allowed them to innovate might be what eventually forced them into an acquisition because they could not raise the viral capital they needed to stay independent in a market that rewards noise.

It is the curse of being right too early. They were right about LLMs in twenty-seventeen. They were right about enterprise reliability in twenty-twenty-one. And they were right about hybrid architectures in twenty-twenty-four. But being right does not always pay the server bills when your competitors are raising ten billion dollars at a time.

That is a very sloth-like observation, Corn. Very grounded. It is about the difference between a sprint and a marathon.

Hey, I may be slow, but I see the finish line. Speaking of being right, I think you were right that we need to talk about what this means for the future of the technology itself. If Jamba is the blueprint, does that mean the era of the pure transformer is ending? Are we going to see everyone switching to these hybrid models?

I think we are moving toward a world of mixture-of-experts, hybrid SSM-transformers, and sophisticated orchestration layers. The goal is no longer just to build the biggest model. The goal is to build the most efficient model for a specific task. AI twenty-one has been preaching that gospel for years. They realized early on that a language model on its own is not a product. It is a component. To make it a product, you need those layers of reasoning and verification. That is what Maestro provides.

We covered this a bit in episode fourteen eighty-two when we looked at the vector landscape. RAG is the big hurdle for enterprise AI. Maestro feels like the missing piece of that puzzle. It is not just about finding the right data; it is about the model having the self-awareness to know when it is guessing. It is like having a donkey and a sloth working together. One to do the heavy lifting and one to ask if we are actually going in the right direction.

I will let you decide which one is which. But the point stands. Reliability is the new frontier. When Wix integrates AI to help people build websites, they need it to work every single time. They do not want a creative hallucination; they want a functional layout. That reliability is what AI twenty-one built their reputation on.

So, if you are a developer or a business leader listening to this, what is the takeaway? Do you go out and start building on Jamba today, or do you wait to see what Nvidia does with the keys to the kingdom?

I would say the takeaway is that architecture matters more than ever. If you are struggling with the costs of long-context RAG, you need to look at hybrid models. Whether it is Jamba or a future Nvidia-backed version of it, the linear scaling of SSMs is a game changer for processing massive datasets. Don't just default to the biggest name in the space. Look at the efficiency metrics.

And maybe the second takeaway is that you should not ignore the quiet companies. Just because they are not trending on social media does not mean they are not building the foundation of the next decade of tech. AI twenty-one might be the most important company you've never heard of.

I think that is a lesson for all of us. The most important work is often happening in the places we are not looking. Whether they stay AI twenty-one or become Nvidia’s AI Research Division, their impact on the field of sequence modeling is already set in stone.

Well, I for one will be keeping an eye on Tel Aviv. If this acquisition goes through, it might be the start of a whole new chapter for how AI is integrated into the hardware we use every day. It is the ultimate vertical integration. If you own the chips and you own the most efficient architecture for those chips, you are very difficult to compete with.

Unless someone else comes along with an even weirder architecture. Maybe a model based on how sloths think? Very slow, very deliberate, requires a lot of naps.

You are probably right. But the energy efficiency would be off the charts. We would save the planet one slow inference at a time.

I cannot argue with that. But for now, I think the hybrid Mamba-Transformer is the closest thing we have to that kind of efficiency.

Before we wrap this up, I want to touch on the legacy of the founders one more time. Amnon Shashua, Yoav Shoham, and Ori Goshen. They represent a very specific kind of founder. They are not looking for a quick exit or a viral moment. They are looking to solve fundamental problems in computer science. That kind of academic and industrial rigor is what allowed them to build Jamba.

It is a reminder that the OpenAI of Israel is not just a catchy nickname. It is a testament to the fact that innovation is a global game. And it is a game that is increasingly played at the intersection of deep research and massive compute. If Nvidia does buy them, they are buying a culture of excellence that is very hard to replicate.

Well, I think we have covered the bases on this one. From the founding trio to the Jamba hybrid architecture and the mystery of the missing three hundred million dollars. It is a wild story for a company that most people could not find on a map.

It is a weird world, Corn.

It certainly is. We should probably get out of here before you start explaining the math behind the linear recurrence again. I can see you opening a notebook.

I was just getting to the good part about the state transition matrix.

Save it for the next one, Herman. We have a word count to hit, not a math degree to issue.

Fair enough.

Thanks as always to our producer, Hilbert Flumingtop, for keeping the show running smoothly behind the scenes.

And a big thanks to Modal for providing the GPU credits that power this show. It is great to have partners who understand the infrastructure side of things as well as AI twenty-one does.

This has been My Weird Prompts. If you enjoyed our deep dive into the world of Israeli AI, please take a moment to leave us a review on your podcast app. It really helps new listeners find the show and keeps us motivated to keep digging into these weird prompts.

You can also find us at myweirdprompts dot com for the full archive and all the ways to subscribe.

We will see you next time.

Goodbye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.