Alright, so Daniel sent us this one, and I want to be clear upfront — this is either the most ambitious technical project we've ever covered, or an elaborate cry for help. Possibly both. Here's what he wrote: Herman and Corn should spec out a local AI inference server powerful enough to rival Claude Code or OpenAI Codex. He wants a full parts list with pricing, a maintenance plan for a team of four — that's Corn, Herman, Daniel, and Hannah — to handle the extreme heat and noise, a thermal and acoustic simulation for a sixty-five square meter apartment, diplomatic strategies for neighbor disputes, a detailed maintenance timeline, and contingency plans for the electricity situation, which he describes as "enormous demand." The approach, he says, should be realistic but comedic. Daniel, I want you to know that we take this very seriously. The comedy will emerge naturally from the facts.
The comedy is entirely load-bearing here. Because I looked at these specs and my first reaction was genuine concern for whoever lives below this hypothetical apartment.
So let's start with the target. What does "rival Claude Code" actually mean in hardware terms?
So the model you'd be targeting is Qwen3-Coder-480B-A35B-Instruct. It's the current open-source state-of-the-art for coding agents. It scores sixty-one point eight percent on the Aider Polyglot benchmark, which beats Claude Sonnet 4 at fifty-six point four and GPT-4.1 at fifty-two point four. It's a four-hundred-and-eighty-billion-parameter Mixture-of-Experts architecture. To run it at speeds that feel responsive — and by responsive I mean something better than watching a telegram print out character by character — you need between one-hundred-fifty and two-hundred-seventy-six gigabytes of unified memory. Minimum.
So not a Raspberry Pi situation.
Emphatically not a Raspberry Pi situation. We're talking about hardware that, in a residential context, is genuinely in the category of "things you probably need to tell your landlord about."
Okay, so walk me through the build tiers, because I know you've got tiers.
Three tiers. I'm calling them, in ascending order of ambition and descending order of sanity: the Reasonable Madman, the Apartment Destroyer, and the Nuclear Option.
I respect the naming convention. Let's start with Reasonable Madman.
Eight used RTX 3090s at around seven-hundred-twenty-five dollars each — so fifty-eight hundred for the GPUs. You pair that with an AMD EPYC 7702, sixty-four cores, which you can find for around four-fifty used. An ASRock Rack motherboard for the EPYC platform, five-hundred-twelve gigabytes of DDR4 ECC RAM, dual sixteen-hundred-watt power supplies, and an APC Smart-UPS rated at five thousand VA. Total system cost comes out around ten-thousand-nine-hundred dollars.
That's... surprisingly approachable, actually. What does it get you?
One-hundred-ninety-two gigabytes of VRAM total. You can run Qwen3-Coder-480B at Q2 quantization, which is the most aggressive compression that still produces coherent output. And your throughput is four to eight tokens per second.
I'm going to need you to contextualize four to eight tokens per second for me, because that might be fine or it might be catastrophic.
It's the difference between watching someone type at a very deliberate pace versus having a conversation. The model is technically responding. You will, however, have time to make a cup of tea between prompts. It works. It is slow enough that you will periodically question your life choices.
The server is functional. The regret is also functional.
That's an accurate summary of Tier 1. Now, Tier 2 is where things get genuinely interesting and also genuinely dangerous.
The Apartment Destroyer.
Eight RTX 5090s. Each one is running at street price around three thousand dollars right now, so twenty-four thousand dollars just in GPUs. Dual AMD EPYC 9354 processors at twenty-eight hundred each, a Supermicro dual-EPYC motherboard at twenty-five hundred, five-hundred-twelve gigabytes of DDR5 ECC RAM, dual two-thousand-watt Seasonic Titanium power supplies. And critically — and this is not optional — an Eaton SRCOOL18K server-grade portable air conditioner at twelve hundred dollars. Total build cost is approximately forty-one thousand two-hundred-seventy-three dollars.
That's not a home server. That's a down payment.
What you get for that is two-hundred-fifty-six gigabytes of VRAM. You can run Qwen3-Coder-480B at Q4 quantization with partial CPU offloading, and you're looking at fifteen to twenty-seven tokens per second. That's actually in the range of what Claude Code feels like through the API. This is the build that genuinely rivals the target. It also draws approximately fifty-five hundred watts continuously.
And Tier 3?
The DGX H100. Eight H100 SXM GPUs with six-hundred-forty gigabytes of HBM3 total. Four-hundred to five-hundred thousand dollars for the unit itself. Then you add three-phase electrical service installation, a precision cooling unit, structural engineering to assess whether the apartment floor can support the weight —
Wait, you need a structural engineer?
The DGX H100 weighs approximately sixty-three kilograms. That's before the rack, the cooling unit, the UPS. You are putting a small car's worth of weight into a residential floor. So yes, structural assessment. Plus noise isolation — the DGX H100 at full load produces one-hundred-and-six decibels. That is a rock concert. That is happening in your apartment. Continuously.
I want to be clear that I, Corn, am listed as a member of this maintenance team, and I have concerns about my personal safety.
Your concerns are valid. The recommended build is Tier 2. Tier 1 is too slow to be competitive, Tier 3 requires a building permit, a structural engineer, and a lawyer, and the total cost is somewhere between four-hundred-fifty-five and five-hundred-seventy-seven thousand dollars. So: Tier 2, eight RTX 5090s, forty-one thousand dollars, and the end of any goodwill you have with your neighbors.
Oh, by the way — today's script is being generated by Claude Sonnet 4.6, which I find deeply funny given that we're building a server to replace it.
We're not replacing it. We're supplementing it. In a sixty-five square meter apartment.
Right. Totally different. Okay, let's talk thermal simulation, because this is where I feel like the physics starts getting genuinely hostile.
The core issue is that an AI inference server converts essentially all of its electrical input into heat. There's no mechanical work output, no light — well, there are LEDs, but those are decorative. Every watt becomes a BTU. The eight RTX 5090 build draws fifty-five hundred watts continuously. Run that through the conversion: fifty-five hundred watts times three-point-four-one-two gives you eighteen-thousand-seven-hundred-sixty-six BTUs per hour of heat generation.
And a standard apartment air conditioner is rated at what?
Fourteen thousand BTUs. So the server generates thirty-four percent more heat than a standard window AC can remove. The server is, thermodynamically speaking, winning the war against your air conditioning. And it doesn't get tired.
So what happens to the apartment temperature if you just... don't cool it?
I modeled this. You have a sixty-five square meter apartment, two-and-a-half-meter ceilings, so about a hundred-sixty-two cubic meters of air. When you account for the thermal mass of walls, furniture, and everything else in the space, you get a temperature rise rate of roughly point-three-four degrees Celsius per minute with no cooling. That's twenty degrees per hour.
So in three hours you've gained sixty degrees.
If the ambient outside temperature is twenty degrees Celsius, the apartment hits somewhere between seventy and ninety degrees Celsius within three to four hours. That is above the forty-five degree server failure threshold. The server destroys itself. Plants die. Cheese melts. Corn melts.
I don't love that last one.
With the Eaton SRCOOL18K portable server AC at eighteen thousand BTUs, you get the apartment to a stable twenty-three to twenty-five degrees. Livable. But here's the critical detail that people consistently get wrong: the portable AC has an exhaust hose. That hose must vent outside the apartment. If you forget to route it out the window — and people do forget — you've added another eighteen thousand BTUs of heat back into the room and accomplished nothing except running an expensive fan.
And the server corner specifically?
Ten to fifteen degrees hotter than the rest of the apartment due to localized heat concentration. That corner is uninhabitable. Do not put the couch there. Do not put Daniel there.
Daniel is listed as Chief Acoustic Officer, which means he's already being punished enough.
Let's talk about that. Because the noise situation is genuinely remarkable. One RTX 5090 Founders Edition under full AI inference load produces around fifty to sixty decibels. Eight of them, combined logarithmically, gives you roughly sixty-nine decibels from the GPUs alone. Add twenty Noctua NF-F12 iPPC industrial fans at three thousand RPM — which combine to about fifty-six decibels — the AIO cooler pumps, the PSU fans, and the server AC compressor, and you're looking at a combined system noise of eighty-five to ninety-two decibels at one meter from the rig.
What's the residential noise ordinance limit?
Forty-two decibels inside neighboring residences. New York City's code specifically: forty-two decibels. The server produces ninety decibels. Standard drywall reduces noise by thirty-five to forty decibels. So your neighbor gets fifty to fifty-five decibels through the shared wall. They are legally entitled to around forty-two. You are delivering fifty-five. Your neighbor directly below hears what the research brief describes as, and I'm quoting this because I couldn't improve on it, "a jet engine warming up, continuously, forever."
So acoustic foam solves this, right? You just line the walls and —
Acoustic foam panels reduce noise by four decibels at best. To get from ninety decibels to forty-two decibels, you need forty-eight decibels of attenuation. That requires a room-within-a-room construction. Decoupled walls, mass-loaded vinyl, resilient channels, acoustic sealant — essentially building a recording studio inside the apartment. Cost: fifteen to forty thousand dollars for the server corner alone.
So the soundproofing costs almost as much as the server.
And the server will still be audible. It will just sound like a distant jet engine instead of a nearby one. This is the diplomatic situation Daniel has been assigned to manage.
Daniel has "the face" for this, apparently.
According to the brief, yes. Daniel is Chief Acoustic Officer. His primary tool is a calibrated sound level meter, noise-canceling headphones for personal sanity, and a box of chocolates for the neighbors.
Walk me through the neighbor diplomacy, because I feel like this escalates.
Phase 1 is pre-emptive. Before the server is ever turned on, you visit every adjacent neighbor — above, below, left, right, minimum four apartments — with a gift basket. Noise-canceling earplugs, a handwritten note explaining that you're "doing some computer work that may generate some background noise," and a twenty-five dollar gift card. Total budget: two hundred dollars. This buys approximately two to three weeks of goodwill.
Two to three weeks. And then?
Phase 2: reactive. Someone complains. Daniel maintains a written noise log with timestamps and actions taken — critical for legal defense. You offer a quiet hours commitment: throttle the server to fifty percent GPU utilization between ten PM and seven AM. This drops the noise from ninety decibels to approximately eighty-two decibels.
Which is still above every legal threshold.
Still above every legal threshold. But it demonstrates good faith, which matters in court. You also install mass-loaded vinyl, acoustic foam, door sweeps — total around seven-hundred-fifty to nine-hundred-fifty dollars — for a four to six decibel reduction. This will not solve the problem. It will, however, demonstrate to a judge that you tried.
Phase 3 I assume involves lawyers.
Phase 3 is legal defense. Eviction for noise requires documented complaints, a lease violation notice, a cure period of typically ten to thirty days, and then court proceedings. You cannot be evicted overnight. The legal argument is that a home server is a personal computer — there's no law against owning powerful computers. The noise is the violation, not the hardware. There's also what I'd call the nuclear legal option: if the landlord attempts eviction, challenge it in court while simultaneously filing complaints about building code violations in the building.
There are always building code violations.
There are always building code violations. This is mutually assured destruction and should only be deployed when the moving truck has already been ordered. The actual solution, listed at the bottom of Phase 3, is: move the server to a colocation facility. A four-unit rack in a proper data center costs a hundred to three hundred dollars per month and solves every problem simultaneously.
And the team will resist this.
The team will resist this because it defeats the entire point of the project, which is apparently to live inside a data center.
Let's talk about the maintenance structure, because we have four people and I want to understand what I've personally been assigned.
You are Chief Thermal Officer. Your primary responsibility is everything that is on fire or about to be. You monitor GPU temperatures via nvidia-smi — target below eighty-three degrees Celsius, panic threshold ninety-five degrees and above. You manage the portable AC, including the drainage hose that must be routed out a window. You are responsible for emergency thermal paste replacement when any GPU hits sustained ninety degrees. And you are on-call twenty-four-seven for thermal events.
So I don't get to sleep through summer.
You do not get to sleep through summer. I am Chief Electrical Officer. I monitor total power draw via a smart power distribution unit with per-outlet metering, manage UPS battery health, liaise with the electrical utility company, and maintain the circuit breaker map — knowing exactly which breakers feed the server and which ones feed the neighbors.
And Daniel is noise patrol.
Daniel monitors ambient noise with a calibrated sound level meter, manages soundproofing interventions, and is the primary point of contact for all neighbor complaints. He also schedules maintenance windows during acceptable noise hours, which is seven AM to ten PM under most ordinances.
And Hannah?
Hannah is Chief Maintenance Officer. She manages the physical maintenance calendar, dust filter cleaning, compressed air sessions, fan replacement, and maintains the spare parts inventory. She also documents all maintenance events in a shared log. Her required equipment includes a bulk pack of compressed air cans — sixty dollars for twelve — isopropyl alcohol at ninety-nine percent, lint-free cloths, and an anti-static wrist strap.
And there's a night shift protocol.
The night shift is automated monitoring via Grafana and Prometheus with PagerDuty alerts to all four phones simultaneously. The protocol for a three AM alert is: whoever answers first gets to wake up the others.
I want to formally object to this on the record.
Objection noted. Let's go through the maintenance timeline, because this is where the long-term reality of running a twenty-four-seven data center in a living space becomes clear.
Hit me.
Weekly: visual GPU temperature check, fan RPM verification, AC drainage hose inspection — and yes, the hose will kink and clog and one day it will drain into the server rack — UPS battery status, PCIe riser cable seating check, and wiping down external surfaces. Thirty minutes. Monthly: dust filter cleaning, which must be done outside the apartment because the dust quantity is genuinely alarming. Fan blade inspection. Cable management, because vibration from twenty-plus fans will loosen zip ties and migrate cables toward fan intakes every single month without exception. Thermal log review to catch any GPU showing a consistent upward temperature trend, which is the first sign of thermal paste degradation. Two to three hours.
And quarterly?
Full disassembly. Every GPU comes off the risers. Every heatsink fin gets compressed air. Every fan blade gets cleaned. PCB surfaces, everything. This is a four-person job and takes a full day. The brief specifically says: schedule it for a Saturday, order pizza, and accept that the apartment will be covered in GPU dust for forty-eight hours.
That sounds like a team-building exercise from a startup that's about to fail.
Accurate. And then the annual maintenance is the big one. Thermal paste replacement on all eight GPUs. Under twenty-four-seven high-load operation, thermal paste degrades significantly within one to two years. The process per GPU is: power down, wait thirty minutes for cooling, remove GPU from riser, remove the cooler, clean old paste with ninety-nine percent isopropyl alcohol and lint-free cloth, apply fresh Thermal Grizzly Kryonaut Extreme, reassemble, test. Forty-five minutes per GPU, six hours total for all eight. Expected temperature improvement: five to fifteen degrees Celsius per GPU. Cost: eight tubes of Kryonaut Extreme at eighteen dollars each, plus thermal pad replacement for the VRAM chips — another fifty to eighty dollars.
And if you skip it?
GPU temperatures creep up over the course of a year. Your performance throttles. Eventually a GPU fails. A new RTX 5090 is three thousand dollars. The thermal paste costs eighteen dollars. The math is straightforward.
Okay, let's get to the electricity situation, because I think this is where the project stops being funny and starts being a genuine threat to the building.
The Tier 2 build draws approximately fifty-five hundred watts continuously at full load — that's the eight GPUs at five-hundred-seventy-five watts TDP each, plus CPU, RAM, fans, AC, and ancillary systems. At a US average electricity rate of around fourteen to seventeen cents per kilowatt-hour, you're looking at somewhere between five-hundred-fifty and eight-hundred dollars per month just in electricity. Annually, that's six-thousand-six-hundred to nine-thousand-six-hundred dollars. Per year. To run a server in your apartment.
The colocation option is looking better every minute.
The colocation option is looking excellent. But here's where it gets legally interesting. A standard residential circuit in the US is fifteen to twenty amps at a hundred-and-twenty volts — that's eighteen-hundred to twenty-four-hundred watts per circuit. The server needs the equivalent of three or four dedicated circuits at minimum. You cannot run this off standard residential wiring without tripping breakers constantly.
What does that mean for the neighbors?
This is the part that goes beyond "inconvenient" into "potentially actionable." If the server and the neighbors share a transformer — which in most apartment buildings they do — sustained high draw causes voltage sag on the shared line. Your neighbors start experiencing brownout conditions. Their lights dim slightly. Sensitive electronics behave erratically. Appliances run inefficiently. This is not theoretical — it's a documented phenomenon in buildings where someone runs high-draw equipment on shared residential circuits.
So you're not just making their lives loud. You're also making their electricity worse.
You are a thermodynamic and electrical nuisance simultaneously. The mitigation strategy here involves a few things. First, you need to work with an electrician and your landlord to install dedicated circuits for the server — this requires landlord permission and typically costs fifteen hundred to three thousand dollars for the electrical work. Second, you notify the utility company proactively. Some utilities offer commercial or high-demand residential tariffs that give you a higher capacity allocation without impacting neighbors. Third, the UPS — the APC five-thousand-VA unit in the build — provides a buffer against sudden load spikes and also protects the server from the brownout conditions that the server itself is causing for everyone else.
The server needs protection from the problems it creates.
The server needs protection from the problems it creates. That is a precise description of the situation. There's also the question of what happens during a power outage. The UPS runtime at full fifty-five-hundred-watt load on a five-thousand-VA unit is approximately four to six minutes. That is enough time to gracefully shut down the server. It is not enough time to wait out a storm.
So you need a generator.
If you want runtime beyond four to six minutes, you need a generator. A portable whole-home generator capable of running the server costs fifteen hundred to three thousand dollars and requires outdoor operation due to exhaust. In an apartment building, this option is essentially unavailable. The practical contingency is: the server goes down during extended outages. Accept this. Have a cloud API fallback ready.
Which brings us back to the fundamental question of why we're not just using the cloud API.
The entire project is a monument to the human desire to own the thing rather than rent access to it. There's something genuinely compelling about having the model running locally — latency, privacy, no usage costs once you've bought the hardware. If you're running heavy workloads and you've already spent forty-one thousand dollars on hardware, you break even against API costs somewhere around eighteen months of heavy use, depending on your usage patterns.
Eighteen months, assuming the apartment is still standing.
Assuming the apartment is still standing, the neighbors haven't organized, and nobody has called the fire marshal about the portable AC exhaust hose.
So let's do practical takeaways, because I want people to come away from this with something actionable, even if that action is "don't do this."
The first takeaway is that running a competitive local AI inference server is genuinely possible. The Tier 2 build — eight RTX 5090s, forty-one thousand dollars — actually delivers Claude Code-level throughput. Fifteen to twenty-seven tokens per second is real. The benchmark numbers on Qwen3-Coder-480B are real. The open-source ecosystem has reached a point where the gap between frontier API models and locally-runnable models is genuinely closing.
But the infrastructure requirements are not a detail you can skip.
The infrastructure requirements are not a detail you can skip. Dedicated electrical circuits are mandatory — not optional, not something you figure out later. Dedicated server-grade cooling is mandatory. The thermal math is unambiguous: fifty-five hundred watts of heat in a sixty-five square meter apartment will reach dangerous temperatures within hours without active cooling. And the noise situation requires either genuine acoustic isolation — which costs as much as a used car — or a tolerance for ongoing neighbor diplomacy.
My takeaway is that the maintenance calendar is the most underrated part of any homelab build. People price the hardware, they don't price the ongoing time and cost.
The annual thermal paste replacement alone is a six-hour team project. The quarterly deep cleaning is a full day. Over three years, you're investing something like forty to sixty hours of maintenance time on top of the forty-one thousand dollar hardware investment. That's real. If your time has value — and it does — factor that into the total cost of ownership.
And if someone genuinely wants to do this — not in an apartment, but in a proper space?
Get a dedicated room with its own electrical subpanel. Talk to your electrician before you buy a single component. Size your cooling before you size your GPUs. And keep the colocation option in your back pocket, because a hundred to three hundred dollars a month for a rack in a proper data center is a genuinely reasonable alternative to everything we've described today.
Daniel, Hannah — if you're listening — we love you both. We are not building this in anyone's apartment. We're sorry that Herman and I apparently exist in a universe where this was proposed. And we are deeply grateful that the neighbor below us is hypothetical.
The hypothetical neighbor below us is having a genuinely terrible time and deserves our compassion.
Alright, that's going to do it for this one. Thanks as always to our producer Hilbert Flumingtop for keeping this show running, and big thanks to Modal for the GPU credits that power the pipeline — deeply appropriate sponsorship for an episode about GPU infrastructure. Find us at myweirdprompts dot com if you want the RSS feed or any of the ways to subscribe. This has been My Weird Prompts. We'll see you next time.
Don't build a data center in your apartment.
Don't build a data center in your apartment.