#3775: SBC Clusters vs Virtualization: The Real Tradeoffs

Why physical isolation sounds great but virtualization usually wins for home servers.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3954
Published: Jun 20
Duration: 24:31
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: home-lab fault-tolerance hardware-reliability

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A listener running Proxmox with ZFS experienced a corruption that took down his entire home server, including Home Assistant and other VMs. His question: since logical isolation through VMs is an illusion when the physical box dies, why not split each service onto its own physically separate compute unit with dedicated storage?

The hardware he's describing already exists. The Turing Pi 2 is a Mini-ITX carrier board that accepts up to four Raspberry Pi Compute Module 4s or NVIDIA Jetson modules, each with its own CPU, RAM, and dedicated M.2 storage. It's exactly the modular SBC cluster architecture he imagined — independent computers sharing only a backplane and power supply.

But there's a reason most home server builders don't use one. The tradeoffs are brutal. Resource efficiency suffers dramatically: with VMs on one box, the hypervisor dynamically allocates CPU and RAM across workloads. With physically separate nodes, each has fixed resources, and spare cycles on idle nodes can't be loaned to busy ones. You end up overprovisioning every node for its peak workload.

The dollar-per-compute ratio is worse too. A Turing Pi 2 with four CM4 modules and NVMe drives costs around $500-600. For that price, a used enterprise SFF desktop delivers far more compute with better software support and single-pane management. The software story is also weaker — instead of one Proxmox interface, you're SSH-ing into four separate machines and maintaining four OS images. And the carrier board itself becomes a new single point of failure that's harder to replace than a standard motherboard.

For most home labbers, the better answer is either Proxmox with ZFS and proper backups (downtime measured in hours, acceptable for home services) or a two-node Proxmox cluster with replicated storage for true high availability.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3775: SBC Clusters vs Virtualization: The Real Tradeoffs

Daniel sent us this one — he runs a home server on Proxmox with a few VMs, Home Assistant among them, and he got burned by a ZFS corruption that took everything down. His point is that VMs feel like isolation, but it's all logical — when the one physical box dies, the isolation was an illusion. So he's asking: instead of virtualizing everything on one machine, what if each service ran on its own physically separate unit of compute, with its own dedicated storage? Has anyone built a hardware platform that integrates a cluster of modular SBC-class units — think a bunch of Raspberry Pi-level boards sharing a chassis and a backplane, but each with its own disk — so workloads are genuinely split across separate physical machines? And if this mostly doesn't exist, why not? What are the tradeoffs that keep people virtualizing on one box?

Oh, this is a great question. The thing he's describing has a name, and it's been shipping for years.

The Turing Pi.

The Turing Pi. The Turing Pi 2, specifically, is a Mini-ITX carrier board that takes up to four Raspberry Pi Compute Module 4s or NVIDIA Jetson modules. Each node has its own CPU, its own RAM, its own dedicated storage — there's an M.2 slot per node on the board — and they all share the enclosure, the backplane, and the power supply. It's exactly the architecture he's sketching. You slot in four independent computers, each with its own disk, and they happen to live on the same board.

Case closed, episode over, let's go nap.

Not quite, because there's a reason most people who build home servers don't use one. And the reason isn't that they haven't heard of it — it's that the tradeoffs are brutal once you actually spec it out.

Walk me through those. Because on paper, physical isolation sounds like the obviously correct answer. One disk dies, one service goes down. The other three nodes don't even notice.

Right, and that's the promise. But let's talk about what you give up. The first thing is resource efficiency, and it's not a small thing. When you run four VMs on one Proxmox box, you've got one pool of CPU cores and one pool of RAM, and the hypervisor can allocate them dynamically. If your Plex server is transcoding something, it can temporarily grab more CPU. If your Home Assistant instance is idle, it's using almost nothing, and those cycles go to something else. With physically separate nodes, each one has a fixed allocation. If node one is pinned at a hundred percent and node three is idling, node three's spare cycles are just — wasted. They can't be loaned out.

You're overprovisioning every node for its worst-case workload, and most of the time most of that silicon is sitting there doing nothing.

And SBC-class hardware is already constrained. A Raspberry Pi CM4 has four Cortex-A72 cores. That's fine for a lot of home services, but it's not a lot of headroom. If you need to give each service enough muscle to handle its peak, you're buying peak capacity times four — or times however many nodes — that sits idle ninety-five percent of the time.

Which is the opposite of why people started virtualizing in the first place.

That's the irony. Virtualization was the answer to exactly this problem — racks of underutilized physical servers, each running one thing, wasting power and space. We spent twenty years consolidating onto fewer boxes, and now the prompt is essentially asking whether we should un-consolidate, but at the SBC scale.

There's something almost philosophical here. The prompt calls logical isolation "not really isolation," which is true from a fault-domain perspective but kind of dismisses the entire point of a hypervisor. The hypervisor's job is to make logical isolation good enough that the hardware failure becomes the only thing you worry about. And hardware failures are rare enough that most people accept the trade.

They are, but they're not zero. And when they happen, they're catastrophic in exactly the way the prompt describes. I had a power supply go noisy on an old server a few years back, and it took out the motherboard. ZFS pool was fine — the drives were intact — but the machine was dead, and everything was down until I could get replacement hardware. That's the moment where you think, "what if these had been separate physical boxes?

Let's talk about what else is out there beyond the Turing Pi, because the SBC cluster idea has been tried a bunch of times.

There was the Cluster HAT, which let you attach four Raspberry Pi Zero boards to a single Pi acting as a controller. Very cute, very limited. There's the PicoCluster, which is basically a nice acrylic case with mounting for multiple Pi boards and a network switch — no shared backplane, just a physical enclosure and a power distribution setup. For something more serious, there's the Upton PiCluster — named after Eben Upton, the Raspberry Pi founder — which was a research project at the University of Southampton that built a sixty-four-node Pi cluster. But that was for teaching parallel computing, not for running home services.

Sixty-four Pis. The power cabling alone.

They used Lego for the rack. I'm not joking.

Of course they did.

None of those are what the prompt is really asking for. The prompt wants a single integrated platform — one chassis, one power supply, one backplane, multiple independent compute units each with dedicated storage. The Turing Pi 2 is the closest thing to that vision. There's also the DeskPi Super6C, which is a similar concept — six CM4 slots in a cluster board. And there are industrial products like the OnLogic CL200 series that do modular SBC clustering, but those are aimed at edge computing deployments, not home labs.

The hardware exists. But the prompt's second question is the interesting one — why doesn't everyone do this? And I think the answer goes deeper than just resource efficiency.

Let me give you the three big reasons. One, the software story is worse. Two, the dollar-per-compute ratio is worse. Three, the management overhead is worse. And none of those are obvious until you've actually tried to run a cluster like this.

Start with software.

Nothing breaks, exactly, but you lose a lot of the nice abstractions. When everything is VMs on one Proxmox box, you've got a single pane of glass for management. You can snapshot a VM, back it up, migrate it, clone it, all from the same interface. When you've got four physically separate SBCs, each one is its own little island. You're SSH-ing into four different machines. You're maintaining four operating system images. If you want to back them up, you're setting that up four times — or you're building some orchestration layer on top, at which point you've just re-invented the hypervisor, but worse.

The orchestration layer itself becomes a single point of failure, which defeats the purpose.

Or it becomes the thing you're maintaining instead of the services you actually wanted to run. This is the trap of home lab distributed systems. You start out wanting to run Home Assistant and Plex, and you end up running a Kubernetes cluster, and now Kubernetes is your hobby instead.

The musical equivalent of buying a drum kit and ending up spending all your time tuning drums instead of playing them.

And for a lot of people, that's fine — the tinkering is the point. But the prompt seems to be asking about resilience as a practical goal, not as a project for its own sake.

The goal is "a disk dying under one unit can't take down the others." That's a specific, practical requirement. So what about the dollar-per-compute problem?

This is where it gets painful. A Turing Pi 2 board costs about two hundred dollars. Four CM4 modules with reasonable RAM — say four gigs each — are going to run you another two hundred to two hundred fifty dollars. Four NVMe drives, even small ones, another hundred to hundred fifty. You're at five to six hundred dollars before you've got a case or a power supply. For six hundred dollars, you can buy a used enterprise SFF desktop — an HP EliteDesk or a Dell OptiPlex — with a six-core x86 processor, thirty-two gigs of RAM, and room for multiple drives. That single box will run circles around four Pi CM4s in raw compute, and it'll do it with better software support, better I/O, and a single management interface.

If you want the fault isolation, you can still run VMs on it. You haven't solved the hardware single-point-of-failure problem, but you've got a much more capable machine for the same money.

The SBC cluster approach is paying a premium for physical isolation, and the isolation you get is only as good as the shared components. Which brings us back to the backplane.

The backplane is the dirty secret here. The prompt says "nothing but the shared backplane is a single point of failure," and that's meant to be the acceptable compromise. But what actually fails in the real world?

Power supplies fail. Backplanes can fail — less often, but they do. The Turing Pi board itself has a management controller on it, a little microcontroller that handles power sequencing and the onboard switch. If that controller dies, all four nodes might be physically fine but you can't power them on or they can't talk to each other. You've traded one single point of failure — the motherboard — for a different one — the carrier board.

The carrier board is a more niche piece of hardware. If my Proxmox box's motherboard dies, I can get a replacement from a dozen vendors by tomorrow. If my Turing Pi board dies, I'm waiting for a shipment from the manufacturer.

Which is a real consideration. Mean time to recovery matters as much as mean time between failures. A common failure with a fast fix might be preferable to a rare failure with a slow fix.

We've got worse resource efficiency, worse dollar-per-compute, worse software ergonomics, and a single point of failure that's harder to replace. This is starting to sound like an answer to a question nobody should be asking.

I don't think that's quite fair. There are scenarios where this architecture makes sense. If you have services that need to be physically isolated — maybe for security reasons, maybe because one of them is experimental and you expect it to crash a lot — then separate physical nodes are valuable. If you need some nodes to be on a physically separate network segment, the Turing Pi's onboard managed switch can do VLAN isolation per node, which is neat. And there's an elegance to it. Each service lives on its own little computer. It's conceptually clean.

It's the home lab equivalent of microservices. And microservices are great until you have to operate them.

That's exactly the parallel. The industry went through a whole microservices hype cycle and then a backlash, and the backlash was mostly about operational complexity. Splitting a monolith into twenty services sounds great until you're debugging a latency issue across seventeen of them at two in the morning.

What's the actual right answer for someone who wants fault isolation in a home lab and doesn't want to go full SBC cluster?

I think there are a few approaches that get you most of the way there without the pain. One is what the prompt is already doing — Proxmox with ZFS and proper backups. If the hardware dies, you restore from backup onto different hardware. The downtime is hours instead of minutes, but for home services, that's usually acceptable.

If hours of downtime isn't acceptable?

Then you're looking at high availability. Two Proxmox nodes in a cluster with shared storage — or replicated storage via ZFS send. If one node dies, the VMs migrate to the other. That's real hardware fault tolerance, and it doesn't require splitting everything into separate physical boxes. It just requires two boxes.

Which is what enterprises do. Nobody in a data center is running each microservice on its own physical server anymore. They're running clusters of hypervisors with live migration.

And the reason they do that instead of one-service-per-physical-box is all the tradeoffs we just talked about, but at scale. Resource pooling is too valuable to give up.

The prompt's intuition — that logical isolation isn't real isolation — is correct. But the proposed solution — physically separate SBCs sharing a backplane — is just pushing the single point of failure one layer down, while introducing a bunch of new problems.

Paying more for the privilege. Which is not to say the Turing Pi isn't cool. It's very cool. I've thought about building one just because the form factor is satisfying and I like the idea of four independent nodes in a Mini-ITX footprint. But I'd be doing it for fun, not because it's the optimal way to run home services.

Let's talk about what "cool" gets you, because I think that's actually part of the answer here. A lot of home lab decisions aren't made on pure engineering grounds. They're made because something is interesting to build.

And the SBC cluster thing has a real appeal. There's something viscerally satisfying about seeing four separate little computers, each with its own blinkenlights, each doing its own job. It's the same impulse that makes people build sleeper PCs or custom water cooling loops. The engineering isn't strictly necessary — it's about craft.

The prompt is asking a practical question, but I suspect there's some of that underneath it. "Has anyone built this?" is partly "should I build this?" and partly "I want this to exist because it would be beautiful.

It would be beautiful. And it does exist, in the Turing Pi and a few other products. It's just not the right tool for the job if the job is "run Home Assistant and a few other services reliably for the least money and effort.

What about the future, though? SBCs are getting more capable. The Raspberry Pi 5 is a real computer now. The CM5 is presumably coming at some point. At what point does the compute-per-dollar get good enough that the tradeoffs shift?

The CM5 is already out — it launched in late 2024. And it's a meaningful step up. You get four Cortex-A76 cores instead of A72s, which is a real generational jump. The IO is better. But it's still an ARM SBC going up against used x86 hardware that's depreciating fast. A used OptiPlex from three years ago is still going to outrun it in most workloads, and the OptiPlex gets cheaper every year.

The SBC cluster is chasing a moving target, and the target is running away from it.

In terms of raw value, yes. But there's another angle here that we haven't talked about, which is power consumption. Four Pi CM5s might draw twenty to thirty watts total under load. That used OptiPlex might draw forty to sixty. Over a year of continuous operation, that difference adds up.

How much are we talking?

At average electricity prices, maybe thirty to fifty dollars a year. Not nothing, but not enough to swing the total cost of ownership decisively. And if you're really optimizing for power, you're probably better off with a single efficient x86 box — something with a low-power Intel N-series processor — than with a cluster of ARM boards.

Unless you need the physical isolation for some reason. Which brings us back to: what are the actual use cases where this makes sense?

I can think of a few. One is security research — if you're doing anything where you want a clean air gap between services, or where you expect to compromise a node and want to contain the blast radius. Physical separation is stronger than VM separation, no question. Two is education — if you're learning about distributed systems, clustering, or orchestration, having four physical nodes forces you to confront real networking and coordination problems that VMs let you gloss over. Three is mixed-architecture setups — the Turing Pi 2 can mix Raspberry Pi and NVIDIA Jetson modules on the same board, which is useful if you need GPU acceleration on one node and not on the others.

That third one is interesting. The Jetson angle.

If you're running local AI inference — say, a small language model or object detection on camera feeds — a Jetson Nano or Orin module gives you GPU acceleration that a Pi can't touch. And you might not want or need that on every node. So the mixed-architecture cluster starts to look like a real engineering solution rather than just a fun project.

The Turing Pi with a Jetson for AI workloads and a couple of Pis for lightweight services — that's a coherent build.

It's coherent, but it's niche. Most people running Home Assistant and Plex don't need a Jetson. And if they do want local AI, they're probably better off with a single machine that has a decent GPU — or just using a cloud API.

Let's talk about the storage angle specifically, because the prompt mentions dedicated physical disks per node as a feature. Is that actually a benefit over ZFS on a single box?

In theory, yes — if a disk fails, only one node is affected. In practice, the disks in an SBC cluster tend to be NVMe drives hanging off a PCIe lane or two, and they're not necessarily more reliable than a good SATA SSD in a Proxmox box. And you lose the ability to do things like ZFS snapshots across your whole dataset, or use a single large pool with redundancy. Each node has its own tiny pool, and you're managing storage piecemeal.

You've traded one big storage problem for several small storage problems.

Which is sometimes a good trade — small problems are easier to reason about — but usually it's just more work. And if you care about data integrity, ZFS on a single box with ECC RAM and regular scrubs is going to catch bit rot and corruption better than ext4 on a Pi with no ECC.

No ECC on the Pi at all.

The memory controller on the Pi doesn't support it. So you're getting physical isolation at the cost of memory integrity. Bit flips in RAM can corrupt data before it ever hits the disk, and you'll never know.

That's a terrifying sentence.

It's not as scary as it sounds for most home workloads — cosmic-ray-induced bit flips are rare, and for serving up Plex or running Home Assistant, the consequences are usually benign. But if you're building a system specifically for resilience and data integrity, running on hardware with no ECC is a strange choice.

The resilience you gain from physical isolation, you partially lose to the lack of error correction in the memory subsystem.

And that's the kind of tradeoff that isn't obvious until you're deep in the weeds. The prompt's intuition is good — physical isolation is real isolation — but the implementation details eat away at the benefits.

Let's step back and ask the architectural question. The prompt says "isolation that's only logical isn't really isolation." Is that true? Or is it more that logical isolation is real isolation for most threat models, and hardware failure is a special case?

I think it depends on what you're isolating against. If you're worried about a compromised service escaping its sandbox — a container breakout or a VM escape — then logical isolation is a real barrier with a real, if small, attack surface. If you're worried about a noisy neighbor consuming all the I/O bandwidth, logical isolation with proper resource limits handles that. If you're worried about hardware failure, logical isolation does nothing — the hypervisor can't protect you from a dead power supply.

The question is: which of those threats are you actually trying to defend against? And for most home users, hardware failure is the least frequent but most annoying one, because when it happens, everything goes dark.

Which is why the backup strategy matters more than the hardware architecture. If you've got good backups and you can restore to new hardware in an afternoon, the single-box failure is an inconvenience, not a disaster. If you've got no backups, no amount of physical isolation saves you from a house fire or a theft or a power surge that takes out the whole rack.

The three-two-one rule exists for a reason.

Three copies, two different media, one offsite. That protects you against way more failure modes than splitting services across SBCs does.

Alright, so to answer the prompt directly: yes, this exists. The Turing Pi 2 and a few similar products do exactly what's described — a carrier board hosting multiple independent SBC-class compute modules, each with its own storage, sharing only a backplane and power supply. It's a real product you can buy today.

The reason most people don't do it comes down to four things. Worse resource efficiency, because you can't dynamically allocate CPU and RAM across nodes. Worse dollar-per-compute, because used x86 hardware is so cheap. Worse software ergonomics, because you're managing N separate machines instead of one hypervisor. And the backplane itself is still a single point of failure — you haven't eliminated the problem, you've just moved it.

The counterpoint is that for specific niches — security isolation, mixed-architecture workloads with GPU acceleration, distributed systems education — it's useful. And for the sheer joy of building something physically modular, it's hard to beat.

If the goal is "my Home Assistant shouldn't go down when a disk dies," the more practical answer is either Proxmox high availability across two nodes, or a solid backup strategy and a spare machine you can restore to.

Or just accept that home services have downtime sometimes, and that's okay.

That's the hardest one for home lab people to accept, but it's often the right answer.

Now: Hilbert's daily fun fact.

Hilbert: In the 1960s, a prospector in Newfoundland claimed to have discovered a new fluorescent mineral he called "willemite variant blue" — a vivid blue-green glow under shortwave UV. The claim was cited in a 1967 geological survey before being corrected in 1972: the sample was ordinary calcite that had been contaminated with uranium-bearing dust from the prospector's own truck.

The prospector discovered his own dirty truck.

The most Newfoundland mineral discovery story possible.

So where does this leave us? The prompt's vision of modular, physically isolated home services is real and purchasable, but the engineering case for it is narrower than the intuition suggests. The real tension is between the elegance of physical separation and the efficiency of logical pooling — and for now, pooling wins for almost everyone.

Though I'll say this — if someone builds a Turing Pi-style board that takes CM5s, has ECC support, and costs under a hundred dollars, the calculus changes. We're not there yet, but the trajectory is interesting.

Something to watch. This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you enjoyed this, leave us a review wherever you get your podcasts — it helps. We're at myweirdprompts.com and on Spotify. I'm Corn.

I'm Herman Poppleberry. See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3775: SBC Clusters vs Virtualization: The Real Tradeoffs

Downloads

You Might Also Like

#3775: SBC Clusters vs Virtualization: The Real Tradeoffs