#2354: Profiling a Ghost Model

A deep dive into Amazon Nova, a mysterious AI model family on Bedrock — and the gaps in what we know.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2512
Published: Apr 20
Updated: May 15
Duration: 18:36
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: ai-models cloud-computing enterprise-hardware

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Amazon Nova is a model family hosted on AWS Bedrock, but much about it remains shrouded in uncertainty. This exploration begins with a frustrating premise: the source page intended for analysis returned a 404 error, leaving no primary documentation to reference. Despite this, the conversation delves into what can be inferred about Nova’s structure, capabilities, and positioning within the AI landscape.

The Nova family includes variants like Nova Micro, Nova Lite, and Nova Pro, suggesting a tiered lineup designed for different enterprise needs. However, without access to a model card or technical details, assumptions about its architecture — whether it’s a dense transformer, a mixture of experts, or something else — remain speculative. What is clear is that Nova sits within Amazon Bedrock, a managed service that integrates seamlessly with the broader AWS ecosystem, making it an attractive option for enterprises already using AWS infrastructure.

Pricing details are equally elusive. While Bedrock typically follows a consumption-based model with separate rates for input and output tokens, specific figures for Nova are unavailable. The advice here is straightforward: always verify pricing directly from the Bedrock platform, as rates can fluctuate frequently.

Benchmarks and performance metrics are another area of uncertainty. Without a model card or published evaluations, it’s impossible to say how Nova stacks up against competitors. However, the Nova family was positioned at launch as a cost-efficient solution for enterprise workloads, particularly document-heavy tasks like summarization and retrieval-augmented generation. These use cases align with the strengths of managed, tiered models, offering predictable latency and ecosystem integration.

For workloads requiring deep reasoning or specialized knowledge, Nova may not be the first choice — not because of proven limitations, but due to the lack of independent benchmarking. Until more data becomes available, Nova is best approached as a candidate for high-volume, lower-stakes tasks.

This exploration highlights the challenges of analyzing emerging AI models with limited information. While Nova’s potential is intriguing, the gaps in our knowledge underscore the importance of caution and thorough due diligence in enterprise AI adoption.

Mentions

Amazon Bedrock Managed service for foundation models
Amazon Nova Amazon's model family on Bedrock
HumanEval Benchmark for code generation
MMLU Benchmark for multitask language understanding
MT-Bench Benchmark for multi-turn conversation
Nova Lite Mid-tier variant of Amazon Nova
Nova Micro Lightweight variant of Amazon Nova
Nova Pro Top-tier variant of Amazon Nova
Stanford Foundation Model Transparency Index Report on model documentation transparency

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2354: Profiling a Ghost Model

Welcome to My Weird Prompts. I'm Corn, my brother Herman is here as always, and today we are doing an AI Model Spotlight. No prompt, no listener submission, just a model we wanted to dig into. Herman, set the scene for us.

Here is the awkward part, and I want to be upfront about this right at the top. We went to profile a model, we had a URL, and the page returned a four-oh-four. Not a soft redirect, not a deprecation notice, just a dead page. So the source brief we are working from is essentially empty.

Which is not a situation we have been in before on one of these spotlights.

No, it is not. And the honest response to that is to say so clearly rather than paper over it. What we can tell you is that the URL we were working from was under the aws.com domain, specifically a path that pointed toward something called Nova on Amazon Bedrock. So the reasonable inference is that this is an Amazon product, or was, or is being reorganised under a different URL. But inference is not confirmation.

When you say Amazon Bedrock, for anyone who is not deep in the AWS ecosystem, what is Bedrock?

Bedrock is Amazon's managed service for accessing foundation models. It is not a model itself, it is a platform. Think of it as Amazon's answer to the API marketplaces that other hyperscalers run. You get access to models from various providers, you pay per token, and it sits inside the broader AWS infrastructure so enterprises already in that ecosystem do not have to move data around to use it.

Nova would presumably be Amazon's own model family sitting inside that platform.

That is the inference, yes. There are public references elsewhere to Amazon Nova models, variants like Nova Micro, Nova Lite, Nova Pro. But we are not going to cite any of that as confirmed fact for this episode, because our source page does not exist. We will flag every gap as we hit it.

Let us keep going and see how much we can responsibly say.

Let us get into what this model actually is, or what we think it is, with all the caveats that come with that.

Right, and I want to be careful here because the honest answer to almost every question in this segment is that we do not know. The page did not load. We have no model card, no architecture notes, no training disclosure, nothing. So what I can do is tell you what the URL structure suggests and where that inference runs out.

Walk us through it.

The path on the AWS domain pointed to something called Nova under Bedrock. From other public sources, not from our source page, there are references to a family of models under the Amazon Nova name. The names that come up are Nova Micro, Nova Lite, and Nova Pro. That naming pattern is pretty standard for a tiered model lineup. Small, medium, large, priced and scoped accordingly. But we cannot confirm which variant, if any, this spotlight was meant to cover, and we cannot confirm that those tiers are still the current structure.

We have no architecture information at all.

None from our source. We do not know if this is a dense transformer, a mixture of experts architecture, a distilled model, nothing. We do not have a parameter count. We do not have a context window figure. We do not know the training data composition or the cutoff date. Those are not small gaps. For a model spotlight, that is basically the whole technical profile missing.

Which is a strange position to be in.

I think the temptation in this situation is to fill the gaps with what you think you know, or what seems plausible given the naming conventions and the platform. We are not going to do that. If we said something like, well, Nova Pro is probably a mixture of experts model because that is what large frontier models tend to be these days, that would be speculation dressed up as analysis. It might be right. It might be completely wrong. We have no basis for it.

What can we say with any confidence?

We can say that if this is an Amazon-built model family, it is sitting on a platform that Amazon controls end to end, which has implications for how it integrates with the rest of the AWS stack. That is an architectural observation about the deployment environment rather than the model itself, but it is not nothing. Enterprises already running workloads on AWS would not need to route data out of that environment to use it. Whether the model itself is worth using is a separate question entirely, and one we cannot answer today.

The licence, open or closed?

No information from our source. Given that it is on a managed commercial platform the reasonable prior is that it is closed access, but that is a prior, not a fact.

Let us move on to pricing, where I suspect the answer is going to be similar.

Herman, what have we got?

Corn, I should flag before we go any further, all pricing we are about to cite is as of April twenty, twenty twenty six, and these numbers shift, sometimes weekly. That caveat is especially important here because we have nothing from our primary source. The page did not load. So there is no pricing table to read from, no input token rate, no output token rate, no cached token rate, nothing.

We are working from zero on this one.

We are working from zero on this one. The AWS Bedrock pricing page is the right place to go for current figures, and we would point anyone building a cost model around these models to verify there directly before committing to any numbers. What we are not going to do is cite figures from memory or from secondary coverage, because pricing on managed cloud platforms moves around enough that anything we say with false confidence could be genuinely misleading to someone doing procurement math.

That is a reasonable position. Is there anything we can say about the pricing structure in general terms, just the shape of it?

In general terms, Amazon Bedrock follows a consumption-based model. You pay per token, input and output are priced separately, and there is typically a lower rate for cached or repeated context. Whether Nova, whatever variant this spotlight was intended to cover, follows that exact structure, and what the specific per-token rates are, we cannot confirm from what we have in front of us.

The honest answer is: go to the Bedrock pricing page, check the date on what you are reading, and do not trust any number that is more than a few weeks old.

That is the honest answer. For any model on a platform like this, that is just good hygiene regardless.

Benchmarks and distinctive traits. Herman, what are we working with?

Less than I would like. The honest answer is nothing we can cite with any confidence. The source page returned a four-oh-four, so there is no model card to pull claims from, no lab-published benchmark table, no stated evaluation methodology. We are at zero on primary source material for this section.

The supplementary research did not fill that gap?

It filled the gap about benchmarks in general, which is useful context but not what we need. We know how the benchmark landscape works. We know that evaluations like MMLU, HumanEval, and MT-Bench are the standard reference points for models in this class. We know that as of twenty twenty five and into twenty twenty six, the gaps between the top fifteen or so frontier models on most of these evaluations are under three percentage points, which means benchmark positioning alone is rarely the deciding factor for enterprise procurement. But none of that tells us where this specific model sits on any of those scales, because we do not have the data.

Is there anything in the public record about the Nova family more broadly, even if we cannot attribute it to this page?

There is, and I want to be careful here about how I characterise it. AWS announced Nova models at re:Invent in late twenty twenty four. The family was positioned as Amazon's own first-party model lineup sitting inside Bedrock, covering a range from a lightweight tier called Nova Micro up through Nova Pro. The framing at launch was competitive pricing and strong performance on enterprise workloads, particularly document processing and retrieval tasks. But I am not going to put numbers on that, because the numbers I have seen in secondary coverage vary enough that I do not trust them, and citing a benchmark figure I cannot verify would be exactly the kind of thing that gets someone into trouble when they are building a business case.

The shape of the claim is: competitive in the enterprise tier, positioned on cost efficiency, but we cannot tell you the MMLU score or the HumanEval number.

That is the shape of it. And I would add one more thing. For a model family this new, the independent evaluation picture is still forming. The Hugging Face leaderboards and the community benchmarking efforts take time to catch up with commercial releases, especially when the model is only accessible through a managed platform rather than as open weights. So even if our source page had loaded, the third-party verification layer might still be thin. That is not a criticism of the model. It is just the reality of where we are in the evaluation cycle.

Worth flagging to anyone doing due diligence right now.

Let us talk about where this model actually fits in practice. If someone is sitting across from us right now and they are trying to decide whether to route a workload here, what can we actually tell them?

Honestly, less than I would want to. We have no suitability indicators from the source page, because the source page does not exist. So anything I say here is inference from the shape of the Nova family rather than from documented capability claims, and I want to be clear about that distinction.

Work with what we have.

If we take the Nova family framing at face value, which is a tiered lineup sitting inside Bedrock and positioned for enterprise use, the workloads that tend to make sense for that kind of architecture are document-heavy tasks. Summarisation at scale, classification pipelines, retrieval-augmented generation where you are pulling from a large internal corpus and need the model to synthesise cleanly. Those are the workloads where managed, cost-efficient models in this tier generally earn their keep.

What is the reasoning there? Why would that class of workload favour something like this over, say, a frontier model from a different lab?

A few reasons, none of which I can verify specifically for this model. One is latency. If you are running thousands of document summarisation jobs in a batch pipeline, you generally want something fast and predictable rather than something maximally capable. Two is cost. Enterprise document processing at volume is a pricing problem as much as a capability problem. And three is ecosystem fit. If your infrastructure is already on AWS, keeping your inference layer inside Bedrock reduces the integration surface area considerably. That is a real operational consideration, not just a sales pitch.

What about the other end? Where would you not reach for this?

Anything requiring deep multi-step reasoning, highly specialised domain knowledge, or creative generation where quality variance matters a lot. Not because we know this model is weak there, but because those are the workloads where you would want strong independent benchmark evidence before committing, and we do not have that. When the stakes on output quality are high and the evaluation data is thin, the conservative move is to test against a model with a more established track record and treat this one as a candidate for lower-stakes, higher-volume work until the independent picture fills in.

The honest framing is: plausible fit for enterprise document and retrieval workloads based on family positioning, but anyone doing serious due diligence needs to run their own evals because we cannot hand them a benchmark table.

That is exactly the framing. Do not skip the evals. We cannot do that work for you on this one.

Alright, let us talk about what the industry is actually saying about this model. Engineers, press, community chatter. What have you got?

I have to be straight with you, and with anyone listening. There is nothing to report. Not in the sense that reception is mixed or that coverage is thin. I mean there is a complete absence of retrievable independent commentary specifically about this model. The source page returned a four-oh-four, the supplementary research pulled back general benchmark methodology content and nothing model-specific, and no named reviewers, no Hacker News threads, no Hugging Face model card discussion, no press coverage surfaced that we can attribute to this particular model with any confidence.

When you say nothing, you mean nothing we can actually cite.

I want to draw that line carefully, because there is a difference between saying a model has bad reception and saying a model has no verifiable reception in our research window. The second one is what we have here. It is not a red flag about the model. It is a gap in our sourcing, and I would rather name it plainly than paper over it.

Is there anything in the broader AWS or Bedrock conversation that gives us useful signal, even indirectly?

There is general industry context, but I want to be careful about how much weight we put on it. The broader picture for managed inference platforms in this period is that enterprise teams are cautiously expanding their AI footprint, but they are also increasingly asking harder questions about transparency and evaluation rigour. Stanford's Foundation Model Transparency Index, which tracks how openly labs document their models, showed an average score of forty out of one hundred in late twenty twenty five, down from the prior year. That is an industry-wide concern, not something we can pin on this model specifically. But it does set the context in which any model with thin public documentation is going to face scepticism from engineers who have learned to ask for receipts.

The absence of chatter is not necessarily damning, but it does mean the burden of proof lands on whoever is evaluating it.

If you are an engineering team considering this for a production workload, you cannot lean on community consensus here the way you might with a model that has six months of public benchmark comparisons behind it. The evaluation work falls to you. That is not unusual for newer or less-publicised models, but it is worth naming as a practical constraint rather than assuming the signal exists somewhere and we just missed it.

No reception to report, and the honest reason is sourcing, not silence.

That is the accurate read.

Given everything we have covered, or more accurately, everything we have not been able to cover, what is the honest takeaway here?

The honest takeaway is that we cannot give you a use-case recommendation in good conscience. Not because the model is bad. We do not know whether it is bad. We cannot say that. But we also cannot tell you to reach for it for document summarisation, or agentic workflows, or low-latency inference, or anything else, because we have no verified data to hang that recommendation on. The source page was dead. The supplementary research returned nothing model-specific. That is the situation.

What do you do if you are an engineering team and this model is on your radar?

You treat it as an evaluation candidate, not a known quantity. If the URL structure we were working from is accurate and this is part of the AWS Nova family, then you have a managed inference platform behind it with real enterprise infrastructure, and that is worth something. Bedrock as a deployment environment has a track record. But the model itself needs to earn its place in your stack the same way any underdocumented model does. You run your own evals against your own data. You do not borrow someone else's benchmark scores, especially when there are none to borrow.

If someone is just curious and wants to poke at it?

That is a completely reasonable thing to do. Curiosity is a legitimate reason to run a model. But go in knowing that you are doing discovery work, not validation work. You are generating your own signal, not confirming existing signal. Keep your expectations calibrated to that.

Is there a version of this where the picture clears up and the model becomes more recommendable?

If independent benchmarks surface, if the model card gets published somewhere we can actually read it, if engineering teams start posting real evaluation results, this could look very different in three months. The gap we ran into today is a sourcing problem, not necessarily a model problem. We just cannot see past it right now.

The verdict is: not enough information to recommend, not enough information to dismiss. Do your own work before you commit.

That is the most accurate thing we have said in this entire episode. And I mean that without irony.

That is our spotlight for today.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2354: Profiling a Ghost Model

Mentions

Downloads

You Might Also Like

#2354: Profiling a Ghost Model