#2597: Voice Control for Renters: $25 Per Room, No Wall Damage

Distributed voice control on a budget with wake words, centralized processing, and zero wall damage — perfect for rentals.

0:000:00
Episode Details
Episode ID
MWP-2756
Published
Duration
33:10
Audio
Direct link
Pipeline
V5
TTS Engine
chatterbox-regular
Script Writing Agent
deepseek-v4-pro

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Daniel's question hits a sweet spot in the smart home space: how do you get distributed voice control in a rental apartment without damaging walls, spending a fortune, or sending everything through a cloud service? The answer has become surprisingly affordable in the last eighteen months.

The Hardware: ESP32-S3 as Voice Satellite

The key enabler is the ESP32-S3 microcontroller, which includes hardware acceleration for neural network inference and built-in audio interfaces. Boards like the M5Stack Atom Echo ($13) and Seeed Studio XIAO ESP32-S3 Sense ($15-20) come with microphones and speakers integrated. Add a small enclosure — or just leave them on a shelf — and you have a complete voice satellite for $20-25 per room.

These satellites handle one job: wake word detection. They run MicroWakeWord, a tiny model from Home Assistant that recognizes specific wake words locally with low latency. When triggered, they stream audio over Wi-Fi to a central Home Assistant server, which handles speech-to-text (using Whisper or similar), intent parsing, and response generation. The server sends back audio through the same satellite.

Why This Beats the Alternatives

The Echo route means cloud dependency and ongoing privacy costs. Full touchscreen panels cost $70-100 per room and require wall mounting. Daniel's ESP32 approach avoids all of that. The satellites sit on existing surfaces — no adhesive, no screws, no landlord friction. They're dumb terminals that don't need individual configuration; the server handles all the intelligence, so adding a new device or changing a room name updates everywhere automatically.

For a 60-square-meter apartment, three or four satellites should provide solid coverage: living area, bedroom, nursery corner, and kitchen. Total cost: around $100. Compare that to $160-400 for Echos (plus cloud dependency) or $280-400 for touchscreen panels.

The Rental-Friendly Details

The "smiley box" concept — small, screenless, unmounted — solves multiple problems. Toddlers can't pull them off walls. You can move them between rooms as needs shift. They're invisible to landlords. And for parents with a baby, voice is the only interface that works when hands are full.

Audio quality from the small satellite speakers is fine for voice responses and background music — white noise, lullabies, podcasts. For serious music listening, route commands to a better speaker through Home Assistant's media player integration.

Implementation Reality

ESPHome added native voice assistant support in late 2023, and the configuration is minimal — about 20 lines of YAML per device, with templates for identical boards. Over-the-air updates mean you never need to physically access the satellites once they're deployed.

Custom wake words are possible through a web-based training tool that takes about an hour and 50-100 audio samples. Default options include "Okay Nabu," "Hey Jarvis," and others. Accuracy in quiet environments exceeds 95%, though baby noise can cause false triggers — a known challenge that's worth planning around.

The bottom line: distributed voice control on a rental-friendly budget is not just possible, it's practical. The hardware is cheap, the software is mature, and the architecture scales without configuration drift. For Daniel — and anyone else with a landlord, a baby, and a desire for local control — the smiley box approach is ready to build today.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3
Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2597: Voice Control for Renters: $25 Per Room, No Wall Damage

Corn
Daniel sent us this one, and it's a layered one. He's thinking about voice control through the whole apartment — not just one big smart speaker, but little voice controllers in each room. The constraints are real: he and Hannah are in a sixty-square-meter Jerusalem rental with a ten-month-old who's pulling on everything. They can't stick things to the walls because the landlord will have a fit. Building custom panels per room is too expensive and time-consuming. So the question is basically: can you do distributed voice control on a budget, with wake words, centralized processing, and zero wall damage? And what would we actually recommend for implementing that?
Herman
Before we dive in — fun fact, DeepSeek V four Pro is writing our script today. So if the jokes land, credit where it's due.
Corn
If they don't, we'll blame the model.
Herman
That's the deal. All right, so Daniel's question hits on something that's genuinely changed in the last eighteen months. The hardware for local voice control has crossed a threshold where per-room voice satellites are actually viable on a budget. I'm talking fifteen to twenty-five dollars per room, not hundreds.
Corn
That's the number I wanted to hear. Because when I first looked at this a couple years ago, putting voice in every room meant either buying a fleet of Amazon Echos or building Raspberry Pi setups that ran eighty bucks a pop minimum. Neither made sense for a rental.
Herman
And the Echo route means everything goes through Amazon's cloud, which Daniel's been pretty explicit about wanting to avoid. He's running Home Assistant, he's into local control, he's already done the ESP32 work. So let's talk about what's actually available now. The Home Assistant Voice Preview Edition has been out for over a year now — launched late twenty twenty-four — and it's a fifty-dollar device that does local wake word detection and sends audio to a Home Assistant server for processing.
Corn
Fifty dollars per room still adds up though. If Daniel wants four or five rooms, that's two hundred to two-fifty. Not insane, but not nothing.
Herman
Right, and that's where the ESP32 ecosystem comes in. There are boards now — the ESP32-S3 specifically — that have built-in audio interfaces, microphone support, and enough processing power to run local wake word detection. You can get these boards for under fifteen dollars. Add a small speaker and a couple of microphones, and you're looking at maybe twenty to twenty-five dollars per room in parts.
Corn
The smiley box concept Daniel mentioned — that's actually clever. You don't need a premium-looking device in every room. A small 3D-printed enclosure, something that sits on a shelf or a bookcase, no wall mounting required. The landlord never even knows it exists.
Herman
That's the rental-friendly angle I love. No adhesive strips, no screws, no Command hooks that rip the paint off anyway. Just little objects that sit on existing surfaces. But here's where the technical part gets interesting — and this is where I think a lot of the online guides miss the point. Daniel said centralized processing, which is the right call. You don't want each satellite doing full speech-to-text and intent processing locally. That's a waste of compute and it makes every satellite more expensive and more power-hungry.
Corn
The satellites just handle wake word detection and audio streaming. Everything else happens on the main server.
Herman
The wake word detection runs on the ESP32-S3 using a tiny model — something like MicroWakeWord, which Home Assistant developed. It's trained to recognize specific wake words locally with very low latency. The ESP32-S3 has hardware acceleration for neural network inference, so it can run these models without breaking a sweat. When it hears the wake word, it starts streaming audio over Wi-Fi to your Home Assistant server, which does the speech-to-text using something like Whisper, then processes the intent, and sends back a response.
Corn
The response can come back through the same satellite? So you get two-way audio?
Herman
Yes, and that's the voice pipeline. The satellite has a speaker, so Home Assistant can respond through it. "Turning off the air conditioning now." Or it can play music, podcasts, whatever you want. And here's what I find really elegant about this architecture — the satellites are essentially dumb terminals. They don't need to know anything about your home configuration. If you add a new light switch or change a room name, you update it once on the server and every satellite picks it up automatically.
Corn
That's the kind of design that makes me actually want to set this up. I hate systems where every device has its own configuration that slowly drifts out of sync.
Herman
Configuration drift is the silent killer of smart homes. You end up with three devices that each think the living room light is called something different. It's maddening.
Corn
All right, so let's get concrete. Daniel's apartment is sixty square meters. That's what, two bedrooms, a living area, kitchen, maybe one bathroom?
Herman
Typical Jerusalem rental layout. For voice coverage in a space that size, I'd say three or four satellites would do it. One in the main living area, one in the bedroom, one near Ezra's area or the nursery corner, and maybe one in the kitchen. At twenty-five dollars each, that's a hundred dollars total. Compared to three or four Echos at forty to a hundred dollars each plus the privacy implications.
Corn
The Echos are constantly phoning home to Amazon. Daniel's been pretty clear he doesn't want that.
Herman
And it's not just privacy — it's reliability. If your internet goes down, cloud-based voice assistants become paperweights. Local processing means you can still turn off the air conditioner or play music from your local library even when the ISP is having a bad day. For parents with a baby, that reliability matters. You don't want to be debugging your internet connection at two in the morning when Ezra's crying and you need to adjust the white noise.
Corn
Speaking of which — the music and podcast playback Daniel mentioned. How does that work through these satellites?
Herman
The satellites themselves aren't high-fidelity speakers. They're small, maybe one or two watts. Fine for voice responses and background music, but you're not going to want to use them as your primary music listening setup. That said, for calming a baby with some white noise or a lullaby playlist, they're perfectly adequate. And you can always have one room with a better speaker connected to Home Assistant — maybe a dedicated media player in the living room that's part of the same voice control system.
Corn
The satellites handle commands and basic audio, and you can still have nicer speakers where you actually want to listen to music seriously.
Herman
And Home Assistant's media player integration means you can say "play the calming playlist on the nursery speaker" and it'll route to the right device. The voice pipeline handles the routing.
Corn
Let me poke at the budget angle a bit more. Daniel said even ESP32 panels get expensive if you're putting one in every room. But I think he was thinking of full control panels with screens and touch interfaces. The voice-only satellites are a different beast entirely.
Herman
A touchscreen control panel — even a small one — you're looking at thirty to fifty dollars just for the display, plus the microcontroller, plus the enclosure, plus the power supply. Easily seventy to a hundred dollars per room. A voice satellite with no screen, just mic and speaker, is a quarter of that cost. And for Daniel's use case — holding Ezra, needing hands-free control — the screen is actually worse than useless. You can't tap a screen when you're holding a baby.
Corn
That's the key insight, I think. The whole reason Daniel's into voice tech is because his hands are full. Adding a screen to the satellite would be actively counterproductive for his primary use case.
Herman
And this is where I see a lot of smart home enthusiasts get it wrong. They want the cool wall-mounted tablet in every room because it looks futuristic. But for actual parents with actual babies, voice is the interface. It's the only interface that works when you're holding a ten-month-old who's just discovered that pulling on lamp cords is the most entertaining activity in the universe.
Corn
The smiley box concept also solves something else Daniel's been wrestling with, which is that Ezra pulls on everything. A wall-mounted panel at toddler height is basically a target. A small box sitting on a high shelf is out of reach.
Herman
And because they're not mounted, you can move them around. If Ezra's play area shifts from the living room to the bedroom, you can move the satellite. If you're spending more time in the kitchen one week, you can bring one in there. The flexibility is a real advantage in a small apartment where spaces serve multiple functions.
Corn
All right, let's talk about the actual implementation. If Daniel wants to do this, what's the shopping list?
Herman
I'd recommend the ESP32-S3 DevKit boards. There are several variants — the M5Stack Atom Echo is a popular one that comes in a tiny enclosure with a speaker and microphone already integrated. It's about thirteen dollars. The downside is the speaker is very small, so audio quality isn't great. But for voice responses and basic audio, it works. Another option is the Seeed Studio XIAO ESP32-S3 Sense board, which has a better microphone array and a microSD card slot for local audio storage. That's around fifteen to twenty dollars.
Corn
These all run ESPHome?
Herman
Yes, and this is where it gets really nice. ESPHome added native voice assistant support in late twenty twenty-three, and it's matured significantly since then. You configure the board in YAML, tell it what wake word model to use, point it at your Home Assistant server, and it just works. The configuration is maybe twenty lines of YAML per device. If you're using identical boards, you can template the configuration and deploy to all of them in minutes.
Corn
That's the kind of simplicity Daniel needs. He's already got a full-time job, a baby, and about a dozen open-source projects. He can't be hand-configuring each satellite for three hours.
Herman
And the ESPHome integration with Home Assistant means over-the-air updates too. You don't need to physically access the devices to change their configuration. That matters when they're sitting on high shelves.
Corn
What about the wake word? Daniel mentioned needing wake word detection. What's the state of that in the local ecosystem?
Herman
This has improved dramatically. Home Assistant's MicroWakeWord project has pre-trained models for several wake words — "Okay Nabu" is the default, "Hey Jarvis" is available, "Hey Mycroft" for the Mycroft fans. And you can train custom wake words now. There's a web-based tool where you record samples of yourself saying the wake word, it trains a model, and you deploy it to your satellites. The whole process takes maybe an hour.
Corn
Could Daniel train "Hey smiley" or whatever he wants to name the system?
Herman
The custom wake word training requires maybe fifty to a hundred audio samples. It's not a huge lift. The model runs locally on the ESP32-S3, and the accuracy is quite good — I've seen benchmarks showing over ninety-five percent accuracy in quiet environments. In a home with a babbling baby, it might dip a bit, but that's true of any voice system.
Corn
The babble noise profile problem. We've talked about that before — Ezra's screaming creates a unique challenge for noise suppression.
Herman
It does, and I don't want to sugarcoat this. Babies make unpredictable noises that can false-trigger wake word detection. That said, the MicroWakeWord models are trained on diverse background noise, and you can add your own background noise samples during custom training. Daniel could record some ambient audio of his apartment with Ezra making his usual sounds, include that in the training data, and the model will learn to ignore it.
Corn
That's a clever approach. Train the model on your specific noise environment.
Herman
It's the same principle as training a noise suppression model on your specific background sounds. The more the model knows about what your home actually sounds like, the better it can distinguish wake words from background noise.
Corn
Let me ask about something I've been wondering. With three or four satellites in a small apartment, how does the system handle multiple satellites hearing the same wake word? Does every device in earshot try to process the command?
Herman
Home Assistant handles this through a feature called "voice assistant pipeline" with satellite grouping. When you have multiple satellites in the same area, you can configure them so that only one responds at a time. The server receives audio from multiple satellites, picks the one with the clearest signal, and routes the response there. The other satellites stay quiet. It's not perfect — there can be edge cases where two satellites both think they heard the wake word — but in practice it works well in a sixty-square-meter space.
Corn
You don't get that annoying echo effect where you say "turn off the lights" and three different voices respond "okay" from different corners of the room.
Herman
And the grouping configuration is done in Home Assistant's voice settings. You create an area — say "living room" — and assign satellites to it. The system knows that satellites in the same area should coordinate.
Corn
Let's talk about the central server. Daniel's already running Home Assistant on something, I assume. Does voice processing add significant load?
Herman
It depends on the speech-to-text engine. If he's using Whisper — which is the most common local option — it can be somewhat demanding. On a Raspberry Pi 4, Whisper can take a few seconds to process a short command. That's functional but not snappy. On something with a bit more power — an old laptop, a mini PC, even a Raspberry Pi 5 — the latency drops significantly. I've seen Whisper running on a Pi 5 process a five-second voice command in under a second.
Corn
Latency matters a lot for voice interfaces. If you say "turn off the lights" and there's a three-second delay, it feels broken.
Herman
It absolutely does. The acceptable latency for voice control is under two seconds from end of speech to action. Ideally under one second. With a decent server, local Whisper can hit that. And there are lighter-weight options now too — faster-whisper, whisper.cpp, and some newer streaming models that start processing before you finish speaking. The technology keeps improving.
Corn
Daniel mentioned air conditioning control specifically. That's an interesting one because Israeli apartments often have these standalone AC units with infrared remotes. How does voice control interface with those?
Herman
That's actually a solved problem in Home Assistant. You can use an infrared blaster — something like the Broadlink RM4 or even a simple ESP device with an IR LED — to send the same infrared signals the remote would send. Home Assistant has a climate integration that maps voice commands like "set the air conditioner to twenty-four degrees" to the appropriate infrared codes. You set it up once, and then voice control just works.
Corn
You don't need a smart AC unit. You just need something that can blast infrared at it.
Herman
And infrared blasters are cheap — the Broadlink RM4 Mini is about twenty-five dollars and can control multiple devices in a room. One per room with an AC unit, and you're covered. Home Assistant learns the codes from your existing remote, so compatibility is basically universal.
Corn
That's exactly the kind of pragmatic solution Daniel needs. He's not going to replace his air conditioners with smart models. He needs to work with what the apartment already has.
Herman
That's the philosophy that should guide this whole project. Work with what you have. Don't build custom panels. Don't mount things on walls. Don't replace appliances. Just add small, cheap, voice-controlled interfaces that sit on existing surfaces and talk to the stuff that's already there.
Corn
I want to circle back to something Daniel said that I think is important. He described it as "playing life on impossible mode." And I think that's worth sitting with for a second. Having a baby in a small rental apartment with restrictions on what you can modify — that makes a lot of standard smart home advice useless. Most YouTube tutorials assume you own your place and can drill holes wherever you want.
Herman
The entire smart home industry has a homeowner bias. It's all wall-mounted tablets, in-wall relays, hardwired sensors. For renters — and that's a huge portion of people — most of that advice is irrelevant or actively harmful if it gets you in trouble with your landlord. What Daniel's asking for is the renter's smart home. Fully removable, no wall damage, no permanent modifications.
Corn
Voice control is inherently renter-friendly because the interface is in the air, not in the walls. You're not modifying the physical environment. You're just adding a layer of sound.
Herman
That's a beautiful way to put it. A layer of sound. And that layer can be as thick or as thin as you want. Three satellites or ten. It scales without any physical impact on the apartment.
Corn
Let me push on something though. Daniel's ten-month-old is going to become a two-year-old, and then a three-year-old. At some point, Ezra's going to figure out that talking to the smiley boxes makes things happen. How do you handle that? Do you end up with a toddler who's constantly turning the lights on and off?
Herman
That's a real concern, and it's one of the reasons I think local voice control is actually better than cloud-based systems for families. Home Assistant lets you set up user-specific voice recognition. It's not perfect yet — speaker identification is still an active research area — but you can at least restrict certain commands to certain devices or certain times. You can say "between seven PM and seven AM, only respond to commands from recognized voices." Or you can put a simple physical mute button on the satellite — a little switch that disables the microphone.
Corn
A physical mute switch is underrated. There's something satisfying about a hardware off switch that software controls can't replicate.
Herman
It's easy to add. The ESP32 boards have GPIO pins — you can wire a simple toggle switch to. Flip it up, microphone is live. Flip it down, completely silent. No software involved.
Corn
All right, let's get to the part where we actually make a recommendation. Daniel asked what we'd recommend for this implementation. You've laid out the ESP32-S3 path. Is that your recommendation, or are there other options worth considering?
Herman
I think there are three viable paths, and the right one depends on how much time Daniel wants to invest versus how much money he wants to spend.
Corn
Give me the three paths.
Herman
Path one is the ready-made route: buy Home Assistant Voice Preview Edition devices. Fifty dollars each, they work out of the box, great audio quality, multi-microphone arrays for better voice pickup. For three or four rooms, that's a hundred fifty to two hundred dollars total. Zero assembly required. This is the path if time is more scarce than money.
Herman
Path two is the DIY ESP32 route. M5Stack Atom Echo or Seeed Studio boards, about thirteen to twenty dollars each, plus a bit of configuration in ESPHome. Total cost around forty to eighty dollars for three or four satellites. This is the path if you're comfortable with basic YAML configuration and want to save money. It's also more customizable — you can add physical mute switches, choose different wake words per room, that kind of thing.
Herman
Path three is the hybrid approach. One Voice Preview Edition in the main living area for the best audio quality and voice pickup, and ESP32 satellites in the secondary rooms where you don't need premium audio. Best of both worlds — good coverage everywhere, premium experience where you spend the most time, total cost around a hundred to a hundred twenty dollars.
Corn
I think path three makes a lot of sense for Daniel's situation. The living room is where he and Hannah spend most of their waking hours with Ezra. That's where you want the best voice pickup and audio quality. The bedroom and nursery can get by with the cheaper satellites.
Herman
The Voice Preview Edition has a much better speaker, so if they want to play music or podcasts in the living room through the voice system, it'll actually sound decent. The ESP satellites in the other rooms are fine for voice responses and basic audio, but you wouldn't want to listen to a podcast on them for an hour.
Corn
The hybrid approach also means Daniel only needs to buy one Voice Preview Edition to start. He can add the ESP satellites over time. It's not an all-at-once investment.
Herman
Start with one good device in the main room, see how it works, then expand. The incremental cost per room is low enough that you can add satellites gradually without feeling the financial hit.
Corn
Let's talk about something that I think gets glossed over in a lot of these discussions: the spouse acceptance factor. Hannah's going to be using this system too. If it's finicky or unreliable, she's going to hate it, and it'll end up unused.
Herman
This is maybe the most important consideration and almost nobody talks about it. The system has to work for everyone in the household, not just the person who set it up. If Hannah says "turn off the air conditioning" and nothing happens, or worse, the wrong thing happens, she's not going to use it again. Trust is hard to build and easy to lose with voice interfaces.
Corn
Reliability is the top priority. Not features, not customizability, not cool factor. Does it work every time?
Herman
And this is where I think the local processing approach actually helps. Cloud-based voice assistants have variable latency depending on internet conditions. Local processing is consistent. The wake word detection happens on-device, so there's no network dependency for the initial trigger. The speech-to-text happens on your local server, so no cloud latency. The whole pipeline is under your control.
Corn
If something breaks, you can debug it. With a cloud system, you're just staring at a blinking light hoping Amazon's servers come back.
Herman
Home Assistant's voice pipeline has debugging tools built in. You can see exactly what the satellite heard, what Whisper transcribed, what intent was matched, and what action was taken. If something goes wrong, you can trace the whole chain.
Corn
For Daniel specifically — he works in AI and automation. He's going to appreciate having that level of visibility into the system. It's not a black box.
Herman
And he can tune it over time. If certain commands consistently get mis-transcribed, he can add custom intent handling. If the wake word false-triggers during Ezra's particular screech frequency, he can retrain the model with that sound in the background. The system gets better the more you use it, which is the opposite of most consumer tech that gets worse over time.
Corn
Let me ask about power. These satellites need to be plugged in, right? They're not battery-powered?
Herman
They can be either. The ESP32 is very low power — it can run for days on a small battery. But for a voice satellite that's always listening for a wake word, you probably want it plugged in. The always-on microphone and wake word processing do draw power continuously. It's not a lot — maybe half a watt — but it'll drain a battery in a day or two.
Corn
You're trading one problem for another. No wall mounting, but you do need to run USB cables to wherever the satellites sit.
Herman
Yes, and that's a genuine consideration. A small box on a shelf is great, but it needs a USB cable trailing to an outlet. In a small apartment, that might mean visible cables running along baseboards or behind furniture. It's not a dealbreaker, but it's worth thinking about placement.
Corn
Flat USB cables exist, and you can tuck them under rugs or along the edges of rooms. It's not ideal, but it's workable. And compared to drilling holes in walls and running in-wall power, it's vastly more renter-friendly. A trailing USB cable is removable in thirty seconds with no trace.
Herman
Corn
I want to go back to the centralized processing idea for a moment. Daniel said "centralize them" in his prompt. I think he's thinking about a single server handling all the voice processing. What does that server actually need to be?
Herman
If he's already running Home Assistant — which I believe he is — he just needs that server to have enough horsepower for speech-to-text. A Raspberry Pi 5 with eight gigabytes of RAM can handle Whisper fine. An old Intel NUC or a used mini PC is even better. If he's running Home Assistant on a virtual machine or a Docker container on a more powerful machine, he's golden.
Corn
The satellites just need Wi-Fi. They don't need to be on the same physical network segment or anything.
Herman
Just the same local network. They discover the Home Assistant server through mDNS — multicast DNS — which is basically automatic. You plug them in, they show up in Home Assistant, you assign them to a voice pipeline, done.
Corn
That's the kind of setup experience that doesn't make you want to throw things across the room.
Herman
It's gotten so much better. Two years ago, setting up a local voice satellite meant compiling firmware, fighting with audio drivers, debugging I2S configurations, and spending hours in documentation. Now it's mostly copy-paste YAML and a few clicks in the Home Assistant interface.
Corn
The open-source community has done incredible work here. Home Assistant, ESPHome, MicroWakeWord, Whisper — all of these are community-driven projects that have matured to the point where they're usable by normal humans.
Herman
They're all free. No subscriptions, no licensing fees, no per-device costs. Daniel can set up a five-satellite voice system for the cost of the hardware and that's it. No monthly bill, no "premium features" locked behind a paywall.
Corn
That's a stark contrast with the Amazon and Google ecosystems, where you're paying for the hardware and then paying again with your data.
Herman
The data stays in your house. Every voice command Daniel speaks stays on his local server. It's not being shipped off to a data center for analysis, not being used to train advertising models, not being listened to by contractors for "quality assurance." For a family with a baby, that privacy matters. You don't want recordings of your home life sitting on a server somewhere.
Corn
Especially not when those recordings include a baby crying or parents having exhausted conversations at three in the morning. That's intimate data.
Herman
It really is. And I think people underestimate how much audio data reveals about your life. Your schedule, your relationships, your emotional state, your health. All of that is encoded in voice commands and ambient audio. Keeping it local isn't paranoia — it's basic privacy hygiene.
Corn
All right, let me try to summarize what we're actually recommending. Daniel, if you're listening — and I know you are — here's the plan. Start with one Home Assistant Voice Preview Edition in the main living area. That's fifty dollars, works out of the box, gives you the best voice pickup and audio quality. Then, if you want to expand, add ESP32-S3-based satellites in the other rooms — something like the M5Stack Atom Echo at thirteen dollars each. Configure them through ESPHome, point them at your Home Assistant server, and you've got whole-apartment voice control for under a hundred fifty dollars total.
Herman
If he wants to go fully DIY from the start, the all-ESP32 route comes in under eighty dollars for four rooms. That's affordable for what you're getting.
Corn
For the air conditioning control, add a Broadlink RM4 Mini or similar IR blaster per room with an AC unit. Twenty-five dollars each. Not strictly necessary if you're only doing voice control for music and basic smart home stuff, but if Daniel specifically wants to adjust the AC by voice, that's the piece.
Herman
For the wake word, use Home Assistant's MicroWakeWord with a custom model trained on Daniel and Hannah's voices plus some ambient apartment noise. That'll give the best accuracy and minimize false triggers from Ezra's baby sounds.
Corn
The whole thing sits on shelves, bookcases, maybe the top of the refrigerator. No wall mounting, no adhesive, no holes, no landlord complaints. When they move out, everything gets unplugged and packed in a box.
Herman
That's the renter's smart home. Fully functional, fully removable. And the best part is, the system gets better over time. New wake word models, faster speech-to-text engines, better intent handling — all of that arrives through software updates without changing any hardware.
Corn
I do want to mention one thing we haven't touched on, which is that Google just announced they're putting Gemini into millions of vehicles. The voice assistant space is moving fast, and the big tech companies are pouring billions into this. But the local, privacy-respecting approach that Home Assistant represents is actually pulling ahead in the home, not because it has more features, but because it respects the user.
Herman
It's modular. Daniel's not locked into anyone's ecosystem. If a better wake word model comes out next year, he can swap it in. If a new speech-to-text engine is faster, he can switch. The components are interchangeable because they all speak the same open protocols.
Corn
That's the real advantage of the open-source approach. You're not betting on a single company's roadmap. You're betting on a community that's constantly improving things.
Herman
The community is huge. Home Assistant has over three hundred thousand active installations. ESPHome has tens of thousands of contributors. When something breaks, someone's already fixed it and posted the solution.
Corn
Let me ask one more question, and this is the one I think Daniel might be wondering but didn't explicitly ask. How much time is this going to take? He's a new parent with a demanding job. He can't spend a weekend buried in YAML files.
Herman
Realistically, for the hybrid approach I recommended — one Voice Preview Edition plus two or three ESP satellites — I'd say three to four hours of setup time, spread over a weekend. The Voice Preview Edition takes maybe thirty minutes to unbox and configure. Each ESP satellite takes about an hour if you're being careful and methodical. The infrared blaster for the AC takes another thirty minutes. It's not nothing, but it's not a massive project either.
Corn
Once it's set up, the maintenance burden is low. Firmware updates happen through the Home Assistant interface. You don't need to physically access the devices.
Herman
The initial investment of time pays off in daily convenience. Being able to adjust the air conditioning while holding Ezra, or start a podcast while your hands are full with baby stuff — that's the kind of small quality-of-life improvement that adds up.
Corn
Now: Hilbert's daily fun fact.

Hilbert: The national animal of Scotland is the unicorn. It has been since the twelve hundreds, when it was used on the Scottish royal coat of arms. Scotland is one of the only countries in the world whose national animal is a mythological creature.
Corn
...right.
Corn
To wrap this up — the voice-controlled apartment Daniel's imagining is not only possible, it's achievable on a budget that won't break the bank. The technology has matured to the point where local, private, reliable voice control is accessible. And the renter-friendly approach we've outlined means no wall damage, no landlord conflicts, and a system that can move with them when they eventually leave that sixty-square-meter Jerusalem apartment.
Herman
The one thing I'd add is that voice control isn't just a convenience for parents. It's an accessibility feature. When your hands are full, when you're exhausted, when you can't get to a switch or a screen — voice is the interface that works. And building that interface on open, local, private foundations means it'll keep working for years without anyone trying to monetize your family's audio data.
Corn
This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you want more episodes like this one, head to myweirdprompts dot com or find us on Spotify. We'll be back with another prompt soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.