Software apps and online services
Hand tools and fabrication machines
For the tinyML Vision Challenge I am doing a proof of concept project to identify people approaching the front doorway of my house. I've been impressed by the capabilities of the Luxonis DepthAI cameras and wanted to try using their LUX-ESP32 which integrates an ESP32 for host processing and communication. Using a high capability device like a DepthAI camera requires careful management of power consumption. The majority of the power is consumed during inferencing, so I've decided to add an additional microwave sensor to only turn on inferencing when an object has entered the detection boundary.
I won't have time to implement a totally packaged and integrated solution, but I hope to demonstrate how to implement the various project elements and their intercommunication.Microwave Detector
The Microwave detector is the RCWL-0516 which is inexpensive and simple to integrate. It uses the Doppler frequency shift of a 3GHz signal to detect a moving object. The unit has a large 4-28V supply range which is regulated to 3.3V. It provides a digital 3.3V detection output for anything in the detection range. The nominal detection range is 7 meters. Sensitivity is best orthogonal to the front side of the board.
My requirement is to detect within 5 meters of the entry door. I decided for prototyping and testing that I would make the unit portable. I mounted the RCWL-0516 on an M5StickC with an 18650 rechargeable battery pack. The User LED will light when movement is detected. The detection event and battery voltage are also published over MQTT. Connection schematic and code are attached.
Here is a picture of the approach to the front door. The first step is 5 meters from the door. In testing the unit would trigger when I was just before that step. Unfortunately, I could not get the exposure on the phone camera to capture a demo video (could not see the LED light at this resolution) although it was clearly visible.
This is a video taken indoors. I am starting in a room 6 meters away. You'll need to look carefully, but the LED turns on just as I start to walk and remains triggered as long as there is movement. It turns off when I stop and triggers again when I step back. (Probably need to view full screen)Microwave Detector MQTT Dashboard
This is the dashboard showing the status of the Microwave detector module. The Node-Red dashboard is running on a Raspberry Pi 4 that I am using as the MQTT server/broker to handle messaging between devices. Status will be either Secure or Motion and Doppler shows a history of detections.
The LUX-ESP32 (OAK-D-IoT-40) is an Embedded/IoT version of the OAK-D. As shown below, it has Stereo OV9282 (1280x800) cameras with synchronized global shutter to implement Spatial AI. Plus a high resolution IMX378 (12MP, 4056x3040) Color camera. The board has an Intel Movidius Myriad X VPU on a SOM with heatsink on the back of the board. The 3 cameras and VPU are used to implement Luxonis DepthAI. In addition, there is an ESP32 that communicates with the VPU via SPI and provides Bluetooth and WiFi connectivity.
I 3D printed a tripod mount for the LUX-ESP32 that I could use in the prototyping phase. It is a completely open frame to allow access to the board if required and the SOM heatsink on the back is fully exposed to improve thermal performance.
Below, the unit is mounted on a short tripod for program development and testing. The top blue microUSB cable is used to program and communicate with the ESP32. The middle red USB-C cable is used to program and communicate with the VPU. The bottom black power cable provides 5V @ 3A (if there is a USB-C power source capable of 3A, this is not required). For my proof of concept, I will probably stay in this configuration although I will try to get to a standalone configuration where the VPU boots with a model running. In that case, only the power cable is required.
Here is the back of the board with the SOM heatsink fully exposed. For initial testing I expect to have extended periods where inferencing is running continuously. I've noticed that the heatsink gets quite hot to the touch when running inferencing. As part of this project I plan to do power and thermal measurements in different operational modes. In final deployment I will minimize the inferencing duty cycle to reduce overall power. Worst case, I may need to add a fan for the enclosure (or try a larger heatsink).
Intel DevCloud for the Edge provides a sandbox for prototyping and experimenting with AI inference workloads on Intel hardware specialized for deep learning. You can optimize your deep learning models with the tools built into the Intel® Distribution of OpenVINO™ toolkit, and then test their performance on combinations of CPUs, GPUs, and VPUs (Vision Processing Units).
I've used OpenVino software quite a while ago with the NCS and NCS2 and I was hopeful that I would be able to do all of my model optimizing and testing in the DevCloud so that I would not have to do an upgrade or reinstall of OpenVino locally. Unfortunately, I found when trying to run the tutorials that the VPU resource was not available. This appears to be a bug in the tutorials, but the moderator pointed me to a post (Tiny YOLO V3 Object Detection Sample) that allowed me to compare that model running on the different hardware devices (CPU, GPU, VPU, etc).
For my proof of concept, I don't think that I have the time to do any model development or play with Model Optimizer settings. The good news is that the Intel Open Model Zoo repository has a number of pretrained models that would be appropriate for my use case. Hopefully, the bug will be fixed for the MYRIAD device in the DevCloud Notebooks and I can try testing different Model configurations later.
Here is the basic flow for use with the Lux-ESP32
For this project I'll start with an existing model (possibly several as the processing pipeline allows that). The key part of the flow that I need to figure out is how to convert the Intermediate Representation (IR) files (bin and .xml) to the .blob file that is needed to deploy the model on the MyriadX.
During the "tinyML Vision Challenge - Intel-Luxonis Vision Platform webinar"
Erik from Luxonis showed how to add the file conversion code to the end of the Object Detection Tutorial Notebook in the DevCloud. The process is straightforward - install the blobconverter python module, run the blobconverter with the path to the IR files and output file and specify the number of shaves (cores) to use. Then use the FileLinks module to download the blob file to the host computer.
Here is the output from the notebook:
A very short clip to verify that the blob file was generated and downloaded properly. Using the depthai_demo script with the .blob file copied into the depthai/resources/nn/mobilenet-ssd folder.
>python depthai_demo.py -cnn mobilenet-ssd
I discovered later that Luxonis also provides an online blob converter that provides an easy access to converting OpenVino models for different versions and shave counts: http://blobconverter.luxonis.com/ .
The inferencing is entirely run on the Myriad X. In prototyping and development the Myriad X is operated by a host processor via the USB-C interface. In standalone deployment, however, the Myriad X operates in an independent bootable configuration. In this case, communication of the inference results are handled by the ESP32 which is communicating with the Myriad X via SPI. The next step is to verify the programming and operation of this interface.
Luxonis has an example programs that run on the ESP32 and interface the VPU via spi-messages. I tried the esp32-spi-message-demo which receives tracklets from the people tracker pipeline running on the VPU. The tracklet provides information as to the tracking status ("New", "Tracked", "Lost", "Removed") and the X,Y coordinates of the detected bounding box. I experienced some latency issues as seen below - the serial stream from the ESP32 lags the video output.
I also had some instability with periodic crashes of the interface. Luxonis is currently refactoring the spi-api to improve the spi throughput and hopefully that will also fix the instability. I am going to defer working on this interface until the update is released. In principle, in my final solution the ESP32 will process detection and recognition data from the VPU and send MQTT information to the broker. I will also use the MQTT information from the Microwave detector to trigger the VPU inferencing.Power Management
One thing that I have observed when running inferencing continuously for long periods of time is that the heatsink on the VPU gets very hot with passive cooling. It would help if I could change the orientation of the fins on the heatsink so they were vertical to allow better convection cooling. I may try to use a different heatsink later or install a fan for active cooling. In any case I need to look at quantifying the amount of power being used and try to reduce it. This would allow me to run on battery backup during power outages and possibly stay with passive cooling. Luxonis indicated that the max operating die temperature of the VPU is 105 C.
The DepthAI API allows logging VPU performance info such as cpu and memory usage and chip temperature. To get a baseline I used the mobilenet-ssd example and plotted it over a 30 minute period.
Using the following command to log temperature to a file:
>python depthai_demo.py -cnn mobilenet-ssd --report temp --report_file mobilenet_templog30min.csv
mobilenet-ssd temperature measurement test conditions:
Normally, I would not expect for someone/something to be in the detection zone for more than 5 minutes, so with respect to die temperature I should be okay (there is margin for continuous operation unless the ambient temperature is high or I use a high power pipeline).
However, there is still a concern about total power. A quick way to reduce power by reducing the inferencing duty cycle is to reduce the frame rate (FPS) of the camera. Below is a quick test of Power vs Frame Rate for the Face Detection/Face Recognition Pipeline NNs that I plan to use in this project.
I used the power at 1 FPS as my baseline and plotted the power difference as I increased the framerate up to 30 FPS. The baseline power was 2.1 W.
The power increase vs framerate is fairly linear starting at 5 FPS. Responsiveness to movement seemed a bit too slow at 5 FPS, so I'm going to see if 10 FPS is a good compromise.Project Pipeline
DepthAI has the concept of a processing pipeline that is run on the VPU. It is constructed with a collection of nodes and the links between them. It is a similar concept to Gstreamer pipelines that I've used before with OpenCV.
Below is a simplified pipeline diagram for my project. It uses two sets of Neural network nodes. The first set will do person detection and tracking when the detected object is beyond 2 meters and the second will do face detection and recognition when the object is within 2 meters (the threshold value may change). The stereo cameras provide the distance measurement from a Spatial location calculator and that is passed to a Script node that controls which Neural network receives the RGB camera output for inferencing. The RGB camera image and the NN inference results are passed to the host using Xout nodes. Here is a link to the DepthAI Pipeline documentation:
I created a "simulation" below of the operation of the processing pipeline that I'd like to achieve. It is a simulation in the sense that it is operating the Neural networks sequentially on an input video rather than switching the camera input.. As you can see from the video, the separate Person Detection and Image Recognition network pipelines are each individually functioning. I am still working on getting the network switching to work with the camera.Steering image flow in the pipeline
I struggled for a bit understanding how to steer the image flow using a script node in the pipeline.
I finally created a simple example to test that I was doing it correctly. I started with the "ImageManip Tiling" example that splits the camera preview frame in half horizontally and displays the two halves in side-by-side frames. I then added the Spatial-Location-Calculator and a script node to swap the tile images if the average depth in the ROI is greater than 2 meters. You can see it working below. The measured depth changes as I move into and out of the region of interest (ROI) positioned in the center of the frame which causes the images in the tiles to swap..
Now I can adapt that script node with the Spatial-Location-Calculator to steer the NN inputs.Switching NN outputs used in the pipeline
I did a short demo of switching between the "mobilenet-ssd" and "face-detection-retail-0004" neural networks based on the object distance based on spatial depth calculation. I need to incorporate this functionality into my end application with the more complex pipelines that I'm using.Project Summary
I was not able to complete a deployed standalone project in the timeframe of the tinyML Vision Challenge, but I've been able to demonstrate all of the elements that I am integrating.
I would like to acknowledge Luxonis and Intel for the great products that they've developed to enable Vision AI at the Edge. And special thanks to the Luxonis team for their timely support on their Discord channel.
I look forward to completing this project and many others using DepthAI and OpenVino. These tools actually make development a lot of fun as you get more capable with them.