#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-1906
Published: Mar 29
Duration: 26:58
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: speech-recognition gpu-acceleration latency

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The "bigger is better" mantra in AI is facing a serious reality check. A recent benchmark run on Ubuntu Linux tested thirteen different speech-to-text models to measure real-time voice typing performance, and the results are counter-intuitive.

The test environment was a standard consumer desktop—a twelfth-gen Intel Core i7 paired with an AMD Radeon RX 7800 XT—running Ubuntu 25.10. Using the Handy tool with ONNX Runtime, the benchmark evaluated models on a specific multi-part sentence including punctuation, pauses, and proper nouns like "River Seine."

The Surprising Winner: Whisper Small

Against all expectations, OpenAI's Whisper Small emerged as the top performer. With only 39 million parameters (compared to Whisper Large's 1.5 billion), it achieved an inference time of 976 milliseconds, a real-time factor of 0.07, and zero errors.

Whisper Large, the "heavyweight champion," took 2,780 milliseconds—nearly three times longer—and committed three errors. This suggests that larger models can suffer from "over-interpolation," where they try too hard to find nuances in clean audio, essentially using a microscope to read a billboard.

The Speed Demon with a Catch

SenseVoice INT8 was the fastest model tested, completing inference in just 145 milliseconds with an RTF of 0.01. However, it made three errors. This highlights the critical trade-off between speed and accuracy: for accessibility use cases, a fast but error-prone model is frustrating because users must manually correct misspellings.

The Hidden Cost of Streaming

Moonshine Streaming models performed worse than expected in this "push-to-talk" workflow. While designed for low-latency feeling during live transcription, the overhead of chunking audio and maintaining state made them slower than batch models for single-shot tasks. It’s a reminder that the right tool depends on the specific workflow.

The Goldilocks Zone of Latency

The benchmark identified a clear latency threshold for user experience:

Under 500ms: Feels like magic.
500-1,500ms: Feels like a tool (acceptable delay).
Over 2,000ms: Breaks the flow of thought.

Parakeet V2 hit the sweet spot with 1,354ms and zero errors, while Whisper Turbo (a pruned version of Large) was fast but made two errors.

Linux and Open Source Wins

This benchmark proves that voice typing on Linux is no longer a hacky workaround. With ONNX Runtime and ROCm drivers, AMD GPUs are now viable for local AI workloads, dismantling the "CUDA tax." The lack of hallucinations across all models—even during a 5-second silence—shows that modern ASR backends have matured significantly.

Key Takeaways

Bigger models aren't always better; smaller, purpose-built models often outperform on clean audio.
Latency and accuracy must be balanced based on use case.
Streaming models add overhead that isn't necessary for push-to-talk workflows.
Linux is a serious platform for local AI inference.

Mentions

Breeze ASR Niche ASR model with errors
Canary One-B v2 NVIDIA ASR model with one error
Handy Open-source voice typing tool for Linux
Moonshine Base Lightweight ASR with zero errors
Moonshine Streaming Streaming variant slower for single-shot
Parakeet V2 NVIDIA CTC-based ASR model
SenseVoice INT8 Very fast ASR model with minor errors
Whisper Large Larger ASR model with more errors in test
Whisper Small Fast and accurate ASR model from OpenAI
Whisper Turbo Pruned version of Whisper Large

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

You know, Herman, I think we have reached a very strange point in the history of computing where the "bigger is better" mantra is finally starting to crumble right in front of our eyes. I was looking at some data this morning that genuinely made me do a double-take. Imagine a world where the heavyweight champion of the world gets knocked out by a middleweight who also happens to be faster, leaner, and more efficient.

It is a classic David versus Goliath scenario, Corn, but in the realm of silicon and neural networks. And honestly, it is about time we saw some hard data backing up what a lot of local-first enthusiasts have been suspecting for a while. Today’s prompt from Daniel is about a rigorous benchmark he ran on Ubuntu Linux, testing thirteen different speech-to-text models using an open-source tool called Handy. He was looking at real-time voice typing performance, and the results are, frankly, a bit of a reality check for the industry.

It really is. By the way, before we dive into the weeds of these thirteen models, a quick shout-out to the tech behind the curtain—today’s episode is actually powered by Google Gemini 3 Flash. It is helping us parse through Daniel’s benchmarks, which he has published over on a Hugging Face Space called Single-Shot ASR Eval.

Using Gemini to talk about Whisper and Parakeet—it is an AI family reunion. But let’s look at the hardware Daniel was using because environment is everything here. He is running Ubuntu twenty-five point ten on a twelfth gen Intel Core i7, but the real heavy lifting is being done by an AMD Radeon RX seventy-eight hundred XT. That is a solid Navi thirty-two card with twelve gigabytes of VRAM. It is a great mid-to-high-end consumer setup, which makes these results actually applicable to people listening who might want to do this on their own desktops.

Right, this isn't some multi-million dollar H-one-hundred cluster in a data center. This is a guy at his desk wanting to talk to his computer. And the "hook" that got me, Herman, is the winner. Whisper Small. Not Whisper Large, not Whisper Medium, not even the fancy new "Turbo" version. Whisper Small clocked in at nine hundred and seventy-six milliseconds for the inference, had zero errors, and a real-time factor of zero point zero seven. It basically embarrassed the larger models.

It really did. To put that in perspective for everyone, a Real-Time Factor, or RTF, of zero point zero seven means that for every second of audio you record, the model only needs seventy milliseconds to process it. That is essentially instantaneous from a human perception standpoint. When you finish your sentence and release the key in Handy, the text is there before your finger has even fully left the button.

And it did it with zero errors. Now, we should be clear about the test sentence Daniel used. It wasn't "The quick brown fox." It was a multi-part prompt: "I had scrambled eggs and toast for breakfast this morning. The coffee was a bit too strong but I drank it anyway." Then a five-second pause—which is a classic test for VAD, or Voice Activity Detection—followed by a bit about Paris and the River Seine. It targets punctuation, pause handling, and specific nouns.

It is a great "single-shot" test. And while a single run has its limitations—which we should definitely talk about later—the spread across these thirteen models is telling. You had Whisper Small at the top with zero errors, and then you look down the list and see Whisper Large. Whisper Large has one point five five billion parameters. It is the "smartest" model OpenAI offers in that family. It took two thousand seven hundred and eighty milliseconds—nearly three times as long as Small—and it made three errors.

That is the part that makes my sloth brain itch, Herman. How does the model with ten times the parameters perform worse on a relatively simple sentence? Is it just overthinking the scrambled eggs?

It sounds counter-intuitive, but in the world of Automatic Speech Recognition, or ASR, bigger models can sometimes suffer from what we call "over-interpolation" or simply being tuned for much noisier, more complex environments. Whisper Large is trained to find speech in a hurricane. When you give it a clean, high-quality recording from a desktop microphone in a quiet room, it might actually try too hard to find nuances that aren't there, or it might struggle with the specific way it was quantized for the ONNX runtime Daniel was using.

So it is like using a high-powered microscope to read a billboard. You end up seeing the fibers of the paper instead of the letters. Meanwhile, Whisper Small is just looking at the letters and getting it right. But let’s talk about the speed demon in the room. SenseVoice INT-eight. Herman, look at that number. One hundred and forty-five milliseconds.

That is breathtaking. An RTF of zero point zero one. That is basically "speed of thought" territory. If you could get SenseVoice to be accurate, you wouldn't even feel like there was a computer involved. It would just be your voice appearing as text in real-time. But—and this is the massive "but" that Daniel found—it had three errors.

Precision versus velocity. If I type a hundred words a minute but every third word is misspelled, I am not actually a fast typist; I am just a loud nuisance. For accessibility, which is one of the main use cases for Handy and local ASR, three errors in two sentences is a dealbreaker. If you are using voice-to-text because you can’t use your hands, having to go back and manually correct "River Seine" because the model was in too much of a hurry is incredibly frustrating.

And this brings us to the "Real-Time Factor" discussion. In the world of voice typing, there is a "Goldilocks zone" for latency. If the latency is under five hundred milliseconds, it feels like magic. If it is between five hundred and one thousand five hundred milliseconds, it feels like a tool—you speak, you wait a beat, it appears. Once you cross that two-second threshold, like Whisper Large did at two point seven seconds, the "flow" of writing is broken.

You start wondering if the app crashed. You stare at the cursor. You lose your train of thought. It is the "Digital Sandwich" problem we have talked about before, though usually that's a mobile issue. On a desktop, you want that immediate feedback. That is why Parakeet V-two is actually the secret star of this list for me. One thousand three hundred and fifty-four milliseconds, zero point zero nine RTF, and zero errors. It is right in that sweet spot.

NVIDIA’s Parakeet models are fascinating because they are built on a different architecture than Whisper. While Whisper is an encoder-decoder Transformer, Parakeet uses a Connectionist Temporal Classification, or CTC, approach with a specialized backbone. It is designed specifically for this kind of high-speed, high-accuracy inference. And Daniel’s results show it is living up to the hype on Linux. It was only slightly slower than Whisper Small but just as accurate.

I also noticed the Moonshine models on the list. We have heard a bit about them recently as being the new "lightweight" alternative. Moonshine Base did great—zero errors, two point three seconds. But the "Streaming" versions of Moonshine actually performed worse in terms of speed. Moonshine Small Streaming took over four seconds. Why would a "streaming" model be slower than a "batch" model in a test like this?

That is a great catch, Corn. It comes down to the overhead of the streaming mechanism in this specific implementation. Streaming models are designed to give you partial results while you are still talking. They are optimized for "low-latency feeling" rather than "fastest total completion." In a tool like Handy, which is "press-speak-release," the app waits for you to finish before sending the whole chunk to the model. So, a streaming model is actually doing a lot of extra work—chunking the audio, maintaining state—that isn't necessary for a single-shot transcription. It is like taking a city bus to go one stop down a highway. The bus is designed for frequent stops, but that just makes the total trip longer if you only have one destination.

That makes total sense. It is the wrong tool for the "push-to-talk" workflow. If Daniel were doing live captions for a three-hour keynote, Moonshine Streaming would be the hero. But for "Type a quick reply to an email," it is just adding overhead. Now, let’s talk about Whisper Turbo. This was OpenAI’s big play for a faster Whisper. It clocked in at one thousand one hundred and twelve milliseconds—very fast—but it had two errors. Whisper Small was faster "and" had zero errors. Herman, is Whisper Turbo just... bad? Or is it just not optimized for the RX seventy-eight hundred XT?

"Bad" is a strong word, but "specialized" might be better. Turbo is essentially a pruned version of Whisper Large. It is trying to keep the "intelligence" of the big model while stripping away the layers that slow it down. But when you prune a model, you often lose the edges. It is like a race car that has had all the interior trim and the passenger seat removed. It is fast, but it might be a bit more temperamental and prone to skidding on corners. In this test, the "corners" were the specific nouns or the punctuation. Whisper Small, on the other hand, is a natively small model. It was trained to be this size. It is balanced.

I love that. It is a purpose-built go-kart versus a stripped-down semi-truck. The go-kart is going to handle the track better every time. I want to circle back to the Linux aspect of this. Daniel is running Ubuntu twenty-five point ten. This is the cutting edge of the Linux desktop. For years, voice typing on Linux was basically a joke. You had to use some hacky Google Chrome extension or a really outdated version of VOSK. But seeing thirteen models running locally on an AMD GPU with sub-second latency? That feels like a massive win for the "Year of the Linux Desktop" crowd.

It really is. And the tool he is using, Handy, version zero point eight point one, is a great example of the modern Linux ecosystem. It is using the ONNX Runtime and Whisper dot C-P-P as the backends. For the listeners who aren't familiar, ONNX is the Open Neural Network Exchange. It allows you to take a model trained in PyTorch or TensorFlow and run it on almost any hardware—Intel, AMD, NVIDIA—using optimized kernels. That is why Daniel can get such great performance on an AMD card, which historically has struggled with AI compared to NVIDIA’s CUDA ecosystem.

Right, the "AMD tax" is finally being repealed. It is great to see the ROCm drivers and ONNX making this accessible. If you have a modern Radeon card, you are no longer a second-class citizen in the AI world. But let's look at the errors for a second. Daniel mentioned that no models hallucinated. That is actually a huge relief. One of the biggest fears with Whisper, especially on longer transcriptions, is that it will start making up a story about a cat in a bathtub if there is a bit of silence.

Wait, I promised I wouldn't say that word. You are right, Corn. Silence is the enemy of many ASR models. They feel a "horror vacui"—a fear of empty space—and try to fill it with text. The five-second pause in Daniel’s test sentence was a perfect trap for hallucinations. The fact that all thirteen models stayed silent during that gap shows that the Voice Activity Detection and the "silence suppression" in these modern backends have improved significantly.

Or they were just so confused by the scrambled eggs they didn't have time to dream up any cats. But seriously, the lack of hallucinations makes these tools dependable for productivity. If I am dictating a legal brief or a technical document, a misspelling is annoying, but a hallucination is dangerous. I can fix "Senn" to "Seine," but I can’t easily fix a model that decides to add a paragraph about the Eiffel Tower being made of cheese.

And that brings us to the practical implications. If you are a Linux user today, and you want to use Handy or a similar tool like Speech Note or Nerd Dictation, what do you choose? Based on this data, the answer is overwhelmingly Whisper Small. It is the perfect intersection of the three things that matter: zero errors, sub-second latency, and low VRAM usage.

It is the "Honda Civic" of ASR models. It just works, it is efficient, and it gets you exactly where you need to go. But I can already hear the "pro" users in my head saying, "But Corn, what about the edge cases? What about accents? What about technical jargon?" Daniel’s test was a "single-shot" with a very clear, likely "Standard English" accent. Does Whisper Small hold up when an Irishman like Daniel, or someone with a thick Scottish or Texan accent, starts talking about "Kubernetes orchestration"?

That is the big limitation of this benchmark, and Daniel is very open about it. It is a single sentence, one run, on one set of hardware. To truly crown a king, we would need a much larger dataset—something like Common Voice or the LibriSpeech test sets—run through these same local backends. Larger models like Whisper Medium and Large often show their strength in those "difficult" scenarios. They have a broader "vocabulary" of human speech patterns.

So Whisper Large is like the veteran professor who can understand anyone but takes forever to get to the point, while Whisper Small is the eager intern who understands "you" perfectly but might get confused the moment you start using slang or a heavy dialect.

That is a fair comparison. But for a "voice keyboard" application, most users are going to be speaking clearly and directly to their computer. They aren't trying to transcribe a rowdy pub conversation from twenty feet away. In that "close-mic, intentional speech" context, the extra "intelligence" of the larger models seems to be wasted or, as we saw here, even detrimental to accuracy.

Let’s talk about the "Breeze ASR" and "Canary One-B" results. These are models from different companies—Breeze is more of a niche player, and Canary is from NVIDIA. They both had errors. Canary One-B v-two had one error and took two point four seconds. It is a massive model compared to Whisper Small. It feels like some of these models are being "over-engineered" for benchmarks and losing the plot when it comes to actual real-world usability on consumer hardware.

There is a lot of "leaderboard chasing" in the AI world. Companies want to say they have the highest accuracy on the "Word Error Rate" benchmarks, but those benchmarks often allow for huge amounts of compute time. In the real world, if I have to wait three seconds for a sentence to appear, I might as well have typed it myself. The "Usability Error Rate" is a metric we should start using. It would factor in latency as a penalty to accuracy.

I love that. The "Corn-Herman Usability Index." If it takes more than two seconds, the accuracy score is halved. Because at that point, you have broken the human-computer feedback loop. You are no longer "writing"; you are "submitting a job for processing."

And that is especially true for accessibility. Think about users with repetitive strain injuries or motor impairments. For them, voice-to-text isn't a "cool feature"—it is their primary interface. High latency in an interface is like having a laggy mouse cursor. It is physically exhausting to use. By proving that Whisper Small can deliver zero-error, low-latency performance on a mid-range Linux box, Daniel is basically showing that the barrier to entry for high-quality accessibility tools has dropped to almost zero.

It is a democratic moment for tech. You don't need a ten-thousand-dollar Mac Studio or a cloud subscription to have world-class dictation. You just need a decent GPU and a bit of open-source software. But speaking of the cloud, Herman, how do these local results compare to something like Google’s cloud-based voice typing or OpenAI’s API?

Cloud ASR often has the advantage of massive language models acting as a "spellcheck" on the output. They can use the context of your entire Google Doc to realize that when you said "Seine," you meant the river, not the "sign." But Daniel’s local test shows that for a standalone sentence, we are getting "perfect" results locally. The gap is closing fast. And the local model has one advantage no cloud can ever beat: privacy.

Right, I can talk about my scrambled eggs—or my top-secret business plans—without a single byte of my voice leaving my local network. For a lot of people, that is the "killer feature" of local ASR. But I want to push back on the "perfect" results for a second. Daniel’s test had thirteen models, and the top four had zero errors. That means there were nine models that failed a pretty basic test. That is a sixty-nine percent failure rate for the industry’s "best" models on a simple sentence. That is actually kind of embarrassing, isn't it?

It is a wake-up call. It shows that "General Purpose ASR" is still a very hard problem. The fact that Whisper Medium and Large—the models everyone raves about—both had three errors is the most shocking part of this entire dataset. It suggests that as we move toward these massive, multi-modal models, we might be losing the "specialization" that made early speech-to-text actually useful. We are making "Swiss Army Knives" that are so heavy you can’t actually use the blade to cut an apple.

I am going to hold onto that "heavy Swiss Army Knife" image. It is perfect. Now, if I am a developer looking at this data, what is my takeaway? Do I just stop using the big models and exclusively ship Whisper Small?

If I were building a desktop app today, I would make Whisper Small the default, but I would give the user a "benchmark" button. Let the app run a test sentence on "their" specific hardware and tell them, "Hey, on your machine, Parakeet V-two is actually faster and just as accurate, do you want to switch?" Because as Daniel’s results show, hardware-software synergy is everything. His RX seventy-eight hundred XT might love Whisper Small, but a different GPU architecture might favor a different model.

It is the "PC Gaming" approach to productivity software. "Auto-detect settings for best performance." I like that. It acknowledges that the Linux ecosystem is diverse. You have guys running this on old ThinkPads and guys running it on quad-GPU workstations.

And there I go again. I’ll just say "Indeed." One more thing to look at is the "Real-Time Factor" of SenseVoice again. Zero point zero one. If that model can be fine-tuned to fix those three errors, the game changes. Imagine a world where the text appears "before" you even finish the word, because the model is so fast it can process the syllables in real-time. We are getting very close to a fluid, conversational interface with our computers.

As long as it doesn't start arguing with me about how I like my eggs. "Scrambled? Really, Corn? A sloth of your stature should be eating more leafy greens."

Well, if it is running locally, you can just delete the model if it gets too cheeky. That is the beauty of it. But let’s look at the "errors" themselves. Daniel notes that Whisper Turbo had two errors. Usually, these are things like "breakfast" becoming "break fast" or missing a comma. In a "single-shot" test, those are minor, but in a thousand-word document, those errors compound.

It is the "death by a thousand cuts." If I have to fix a small mistake every thirty seconds, I am going to turn the feature off. That is why that "Zero Error" column is so prestigious. Whisper Small, Parakeet V-two, Canary Flash, and Moonshine Base. Those are the only four models on this list that I would actually trust to "type" for me. Everything else is just a "transcription assistant" that requires a human editor.

And that is a great distinction. A "Voice Keyboard" needs to be perfect. A "Transcription Tool" just needs to be "good enough" for a human to clean up later. Handy is a voice keyboard. Its job is to put text where your cursor is. It has to be right.

So, the million-dollar question—well, the thirteen-model question. Why did Whisper Small win? Is it just a fluke of this specific hardware and this specific sentence?

It is likely a combination of two things. First, the Whisper Small architecture seems to be the "Goldilocks" size for the current generation of ONNX and Whisper-dot-C-P-P optimizations. It fits perfectly into the cache of the GPU and the instruction sets of the CPU. Second, the "small" model has fewer "distractions." It is less likely to try to apply complex linguistic rules that might not apply to a simple sentence about breakfast. It is a "linear" thinker in a world of "abstract" thinkers.

I feel a strange kinship with Whisper Small. It is simple, it is focused on breakfast, and it gets the job done without overcomplicating things. It is basically the sloth of ASR models.

It really is. And it is a reminder to all of us—developers and users alike—that we should always "test" our assumptions. If Daniel had just assumed "Large is better," he would be waiting three seconds for transcriptions that had more errors. By taking the time to run this "Single-Shot Eval," he has optimized his own workflow and given the community a really valuable piece of data.

It makes me wonder what other "common wisdom" in AI is just plain wrong. Are we using "too much AI" for things that could be handled by a smaller, faster, more reliable model? Probably. We are in the "excess" phase of the AI boom, where everyone wants the biggest, baddest model, but the practical winners are going to be the ones who find the "Small" and "Medium" models that actually solve the problem.

It is the "Right-Sizing" of the AI revolution. We are moving from "Can we do this?" to "Can we do this efficiently and reliably?" And on the Linux desktop, thanks to tools like Handy and benchmarks like this, the answer is a resounding "Yes."

Well, I am sold. I am going to go find a Linux box, install Handy, and see if Whisper Small can handle my "delightful cheeky edge" without crashing. But seriously, this research is a service to the community. If you are listening and you have been on the fence about local ASR, go check out Daniel’s Hugging Face Space. The data speaks for itself.

It does. And it is a growing field. We are seeing new models every week. I wouldn't be surprised if in six months, we have a model that is as fast as SenseVoice and as accurate as Whisper Small. That is the trajectory we are on.

Until then, I’ll stick with the go-kart. It is faster, it is more accurate, and it knows exactly what I had for breakfast.

Which was?

Oh, I haven't had breakfast yet. I was waiting for the AI to tell me what I wanted. Apparently, it is scrambled eggs and toast.

Better get on that, then. The model has spoken.

It really has. This has been a fascinating look at the "underdogs" of the ASR world. It is rare that we get to see such a clear-cut case of "less is more."

It is the beauty of empirical testing. You can’t argue with sub-second, zero-error results.

Well, you "can" argue with them, but you’d be wrong. And being wrong is much less fun than being fast and accurate.

On that note, I think we have covered the spread. From the blazing speed of SenseVoice to the surprising accuracy of the smaller Whisper models, the state of Linux voice typing is stronger than I think most people realize.

Definitely. It is a good time to be an open-source nerd.

It is always a good time for that, Corn.

True, but now we have the benchmarks to prove it.

We do. And I’m sure Daniel will keep pushing these models to their limits. I’d love to see a "stress test" next—maybe some heavy background noise or multiple speakers.

"My Weird Prompts" live from a construction site. Let’s see how Whisper Small handles a jackhammer in the background.

That might be the "Large" model’s time to shine. But for now, for the desktop user, the king has been crowned.

Long live Whisper Small.

Long live the efficient king.

Alright, I think that is a wrap on this one. What do you think, Herman? Did we miss anything?

I think we really hit the core of it—the speed-accuracy trade-off, the hardware synergy, and the practical wins for accessibility. It is a solid look at a very specific, but very important, niche.

Perfect. Well, if you enjoyed this deep dive into the world of Linux ASR benchmarks, we have plenty more for you to explore.

We certainly do. If you want to keep up with the latest in AI and automation, you can find us at my weird prompts dot com. We have the RSS feed there and all the links to subscribe on your favorite platform.

And if you are feeling generous, a quick review on Apple Podcasts or Spotify goes a long way. It helps other curious minds find the show.

Big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.

And a huge thank you to Modal for providing the GPU credits that power this show. Without those serverless GPUs, we’d just be two brothers talking to a wall.

Which we do anyway, but this way, people can actually hear us.

True. Also, a final shout out to Daniel for the prompt and the great research. If you want to see the full table of results, head over to his Hugging Face Space. It is worth a look.

This has been My Weird Prompts.

See ya.

Goodbye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

Mentions

Downloads

You Might Also Like

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy