Machine learning at the edge is extremely useful for creating devices that can accomplish "intelligent" tasks with far less programming and logical flow charts compared to traditional code. That's why I wanted to incorporate at-the-edge keyword detection that can recognize certain words and then perform a task based on what was said.Hardware
This project just has one component: an Arduino Nano 33 BLE Sense. The actual magic happens in the machine learning model. The Arduino Nano 33 BLE Sense is full of sensors, including a microphone, 9-axis IMU, environmental sensor, and a gesture/proximity/color/ambient light sensor (APDS-9960). The microcontroller on it is an nRF52840 that runs at 64MHz and contains 1MB of flash memory and 256KB of RAM. This project also uses its onboard RGB LED to display the current color.
I began by creating a new project on Edge Impulse and then installed the Edge Impulse CLI tool. For more instructions on how to do that, visit the installation instruction page. This lets the Arduino Nano communicate with the cloud service to receive commands and send sensor data automatically. I downloaded the most recent Edge Impulse firmware and flashed it to the board by double-clicking the reset button to make it enter bootloader mode. Then I ran
flash_windows.bat to transfer it.
Over on the command prompt I ran
edge-impulse-daemon and followed the wizard to set it up. Now the Nano shows up in the project's device list, which lets samples be taken and uploaded as part of the training/testing dataset.
Training a machine learning model requires data, and quite a bit of it. I wanted to have the following modes for the RGB LED strip:
I got about 1 minute of sound for each mode where I repeatedly said the word at 1-2 second intervals and split them.
But just having these samples isn't enough, since background noise and other words will give a false reading. Thankfully, Edge Impulse already provides a pre-built dataset for noise and 'unknown' words, so I used their "Upload Existing Data" tool to upload these audio files into the training data.
Finally, I rebalanced the dataset to have the recommended 80-20 split for training and testing data, respectively.
Now armed with an hour of training data and plenty of labels, it was time to train a model. The impulse I designed takes in audio as time-series data with a window size of 1 second and a window increase of 500ms. It then passes through an MFCC block into a Keras neural network block.
The MFCC block allows you to configure how the audio will be processed, along with a spectogram showing the frequencies in a visual way.
I left the neural network settings as mostly default, but made a few modifications as well. First, I changed the minimum confidence threshold from 0.80 to 0.70, and added a bit of data augmentation in the form of additional noise and masking time bands. This helps the NN to avoid over-fitting the model, as it has more diverse data to work with.
The Arduino Nano 33 BLE Sense acts as an always-on microphone that continuously samples the audio and detects if one of the keywords have been spoken. Once one is found, the keyword is converted into an index that is used to decode the desired color. For the on or off keyword, the LED is set to either black or a light gray.
I downloaded the model as a library and added it to the Arduino IDE, then compiled and flashed the code to the Nano.