Is "WiFi Sensing" the future of smart homes, or just a laboratory curiosity? I spent weeks building a human detection array using the ESP32 to find out if we can finally ditch expensive sensors.
For the demo video, please head over to Reddit.
Introduction: The Rise of the "Sixth Sense" for IoTIn the world of home automation and IoT, we are constantly hunting for the perfect presence sensor. PIR sensors are cheap but blind to stationary targets. Millimeter-wave (mmWave) radar is accurate but expensive and complex to integrate. Cameras are powerful but come with massive privacy concerns.
But recently, a "fourth option" has been gaining serious traction in the open-source community: WiFi Sensing.
It started when I noticed a flurry of activity from Espressif Systems engineers on a GitHub repository called esp-csi. This wasn't just a minor update; it was a full-blown push to unlock Channel State Information (CSI) on standard ESP32 chips. Almost immediately, the community responded. I saw projects like espectre appearing, which uses CSI for spectral analysis. I saw researchers publishing code for 2D Indoor Localization using Deep Learning.
The claims were bordering on science fiction. Espressif’s documentation even included a demo called esp-crab where they claimed to achieve precise finger gesture tracking using nothing but WiFi signals.
This piqued my curiosity. If this technology is real, it could democratize spatial sensing. We wouldn't need $30 radar modules anymore; a $5 microcontroller could do the job. But is it robust? Is it replicable? Or is it just a fragile demo that only works in a shielded RF chamber?
To answer this, I gathered a handful of Seeed Studio XIAO ESP32 boards from my workshop and decided to build a room-scale sensing array from scratch.
The Theory: What exactly is CSI?Before diving into the build, we need to understand what we are actually looking at. Most of us are familiar with RSSI (Received Signal Strength Indicator). RSSI is like measuring the "volume" of a sound; it tells you how loud the signal is, but not much else. It fluctuates wildly and provides very little data about the environment.
CSI (Channel State Information) is different. If RSSI is the "volume, " CSI is the "texture" of the sound.
In modern WiFi (OFDM), data is sent over multiple sub-carriers (different frequencies) simultaneously. As these radio waves travel from a transmitter (TX) to a receiver (RX), they bounce off walls, furniture, and people. These reflections cause the waves to arrive at the receiver at slightly different times, creating interference patterns.
CSI captures the amplitude and phase information of each of these sub-carriers. When a human body moves through the room, it changes how these waves reflect and scatter. By analyzing the "shape" of these changes across the frequency spectrum, we can theoretically detect presence, movement, and even specific gestures.
Phase 1: The "Hello World" and the Initial FrustrationMy journey began with the simplest possible goal: Visualize the CSI data.
I selected the Seeed Studio XIAO ESP32S3 for this task. I love these boards because they are incredibly small, powerful enough to handle the math, and easy to hide in a real-world deployment.
I flashed the csi_send example to one XIAO and csi_recv to another. I kept the setup minimal: I stuck the standard FPC (flexible sticker) antennas onto the boards and laid them flat on my desk, about 50cm apart. I then ran the python script csi_data_read_parse.py to render the data stream on my laptop.
The result was disheartening.
I was expecting to see a clean sine wave that reacted to my hand movements. Instead, I saw a chaotic mess of jagged lines. The graph looked like static noise. I waved my hand over the boards—nothing changed. I jumped up and down—the graph remained a messy blur.
I spent two days thinking I had flashed the firmware wrong or that the XIAO hardware wasn't compatible. I tweaked parameters, changed WiFi channels, and rewrote the python parser. Nothing worked. I was essentially looking at radio static.
Phase 2: It’s All About PhysicsAfter re-reading the documentation and diving into RF theory, I realized my mistake wasn't software—it was physics.
- The Antenna Issue: The FPC antennas included with most dev boards are omnidirectional but have low gain. Laying them flat on a table created massive reflections from the desk surface immediately next to the antenna. I was measuring the desk, not the room.
- The Fresnel Zone: Radio waves don't travel in a laser-thin line; they travel in a football-shaped zone between the two antennas called the Fresnel Zone. By having my devices on the desk, the bottom half of that zone was blocked by the table.
The Fix:
I constructed a proper test rig. I mounted the XIAO boards on tripods, elevating them 1.4 meters off the ground. Crucially, I disconnected the FPC antennas and attached external dipole (rod) antennas.
I turned the script back on. The difference was night and day.
The graph settled into a consistent rhythm. When I stood still, the lines were flat. The moment I took a step, the waveform erupted into a beautiful, distinct pattern. I could finally see "me" in the data. This was the turning point—the technology actually worked.
Phase 3: From Data to Detection (The "Tripwire")Now that I had clean data, I wanted to do something useful with it. I switched to the esp-radar/console_test example. This firmware includes Espressif's proprietary algorithms to filter the noise and make a judgment call: "Someone is here" or "The room is empty."
I set up the two tripods 2 meters apart in my living room.
The performance was surprisingly sharp. It acted like an invisible tripwire. When I walked directly between the two tripods, the serial console screamed "Movement Detected" instantly. It felt magical—a sensor with no lens, no moving parts, just air.
However, the limitations became obvious quickly. If I stood two meters behind the receiver, it ignored me. If I sat on the floor, it missed me. A single pair of devices creates a "line" of detection, not a "field." To monitor a real room, I needed more coverage.
Phase 4: Building the "Sensing Array"To solve the blind spot issue, I decided to scale up. I formulated a plan for a multi-device array using 4 Seeed Studio XIAO ESP32s.
I utilized a modified codebase optimized for the XIAO ecosystem. This code allows multiple receivers to report back to a central node.
The Layout Strategy:
I arranged the devices to create a mesh of sensing zones:
- Transmitter (TX): Located in the North-East corner of the room.
- Receiver 1 (RX1): Located in the South-West corner (diagonally opposite). This covers the main walking path.
- Receiver 2 & 3: Placed at the midpoints of the adjacent walls.
This layout ensures that no matter where I stand in the room, I am disrupting the Fresnel zone of at least one pair of devices.
The User Interface:
The project includes a simple Web Server hosted on the main ESP32. I opened the IP address on my phone. The interface was simple: a status indicator showing "Presence", "Clear" or "Motion" and a scrolling graph of signal variance.
I walked into the room. The status flipped to "Motion" immediately.
I walked to the corner, usually a blind spot. The status stayed "Presence".
I sat on the couch and stayed relatively still. The system hesitated for a second, but kept the status as "Presence".
It worked. I had successfully turned my entire living room into a sensor.
After living with this "WiFi Array" for a week, I have a clear picture of the pros and cons. While the "cool factor" is off the charts, the reality is nuanced.
1. The Power Consumption Problem
This is the biggest hurdle. Unlike Zigbee or LoRa sensors which sleep 99% of the time, CSI sensing requires the WiFi radio to be fully active, blasting packets hundreds of times per second.
The XIAO ESP32s run hot. You cannot run these on batteries; they would drain a coin cell in minutes. You need to run USB-C cables to every corner of your room, which ruins the "invisible" aesthetic unless you have outlets perfectly placed.
2. The "Calibration Hell"
This system is environmentally sensitive. If I moved the coffee table, the reflection patterns changed. If I opened the window, the patterns changed.
Getting the system to work requires a "Calibration" phase where the room must be perfectly empty. If you add a new piece of furniture, you often have to recalibrate the baseline. Compared to a PIR sensor which just works, this is a high-maintenance relationship.
3. Complexity vs. Accuracy
My 4-device array achieved about 80% accuracy compared to my commercial 24GHz mmWave radar.
The irony is that to get better accuracy, you need more devices (more RX nodes). But adding more devices increases the network congestion and makes the calibration process even more tedious. It’s a classic law of diminishing returns.
4. The Untapped Potential: Deep Learning
The setup I built uses basic statistical logic (if variance > threshold, then human). The real future lies in Deep Learning.
Ideally, we would feed this CSI data into a Raspberry Pi or NVIDIA Jetson running a neural network. With enough training data, the AI could learn that "Waveform A" means "Walking" and "Waveform B" means "Sitting on the Couch." However, this drastically increases the cost and technical skill required, moving it out of the realm of simple DIY home automation.
ConclusionSo, is ESP-CSI a replacement for radar? Not yet.
If you want a reliable sensor to turn on your bathroom lights, buy a $20 mmWave sensor. It’s easier, lower power, and more accurate.
However, if you are a hacker, a maker, or an engineer who wants to explore the bleeding edge of RF technology, this is an incredible project. The fact that we can extract this level of data from a $5 chip like the Seeed Studio XIAO is a testament to the incredible engineering at Espressif and the open-source community. It’s not a product yet—it’s a superpower waiting to be tamed.








Comments