This tutorial will guide you through the process of building a vacuum cleaner sound recognizer with Edge Impulse and deploying it to the Microchip Curiosity Ultra development board. This post includes the data, trained neural network model, and deployment code so you can get up and running quickly, but it will also explain the development process step by step so that you may learn how to develop your own sound recognizer.
The steps in this guide are summarized by the points below:
- Set up the SAME54 Curiosity Ultra board plus WM8904 daughter board
- Review the operation of the pre-built sound classifier firmware
- Set up the custom processing block server for LogMFE feature extraction
- Clone and review the vacuum-recognition-demo Edge Impulse project
- Modify the Edge Impulse deployment code to support LogMFE features
Before we get started, you'll need to install and set up the required software as detailed in the steps below.
1. Install the MPLAB IPE tool included in the MPLAB X installer in order to flash the pre-built firmware file.
2. Install the MPLAB X IDE and XC32 compiler. These are required to load the sound recognition project and to program the SAME54 board. You can use the default, free license for the XC32 compiler as we won't need any of the pro functionality here.
3. Create a free account with Edge Impulse if you haven’t already. We'll use this to process our sensor data and generate the sound classifier library. The Edge Impulse Studio is an entirely web-based UI so no need to download anything locally.
4. Finally, head over to GitHub and download the latest release of this project, which includes the source code and data required for this tutorial.
Configuring the HardwareTo enable audio collection for the SAME54, we first need to install the WM8904 daughterboard and configure the board’s jumpers appropriately. We’ll use Figure 2 (taken from the Users Guide) as a map for the different components on the board.
1. Connect the WM8904 daughterboard to the X32 audio interface (labeled 5 above) making sure to orient the board so that the 3.5mm audio connectors face the edge of the board.
2. Set the jumpers on the audio codec board to match Figure 3; each jumper should be connected to the two left-most pins on the board when viewed from the perspective of the audio jacks.
3. Set the CLK SELECT jumper (labeled 10 above) so that it connects the MCLK and PA17 pins on the board as shown in Figure 4. This pin configuration lets the WM8904 act as the clock master to the SAME54’s I2S peripheral.
4. Connect your microphone to the audio daughterboard’s MIC IN jack.
5. Connect your PC’s USB connection to the Curiosity board’s EDBG micro USB connector (labeled 2 above).
Great! The hardware is now configured for the demo project. If you have MPLAB X open, the Curiosity board should be automatically detected by MPLAB X.
For more detailed information about the Curiosity Ultra board including schematics, consult the user guide.
Sound Recognition Firmware OverviewBefore jumping into the steps to develop a sound recognizer from scratch, let's quickly cover the pre-compiled firmware for vacuum cleaner detection accompanying this post. Go ahead and program your device with the firmware.hex file from the latest release using the MPLAB IPE tool before moving ahead.
With the firmware loaded, try turning on a nearby vacuum cleaner; after a short delay the firmware will strobe the onboard LED1 located at the top left of the development board near the barrel connector; see Figure 5 for reference.
In addition, the firmware also prints the confidence numbers for classification over the UART port. To read the UART port use a terminal emulator of your choice (e.g., PuTTY for Windows) with the following settings:
- Baudrate 115200
- Data bits 8
- Stop bits 1
- Parity None
For reference, an example of the output is shown in Figure 6. Notice that the confidence numbers are between 0 (no confidence) and 1 (certainty) and the class confidences roughly sum up to 1 (there is some arithmetic error that makes the sum slightly less than 1).
That covers the operation of the firmware, let's move on to the steps needed to reproduce this firmware from scratch.
Data CollectionAs always for machine learning projects, the first thing we need is data. A dataset for vacuum cleaner detection has already been compiled for this project based on publicly available datasets (namely MS-SNSD and DEMAND). The dataset was compiled for the application of detecting a vacuum cleaner in a domestic environment that would be robust to common domestic noise; it includes several scenarios of a vacuum cleaner running indoors and a mix of different types of background noise that includes speech, air conditioner, and a mix of common domestic acoustic activity such as dishwashing, laundry, and music playback. The vacuum cleaner data is included with the vacuum-recognition-demo Edge Impulse project (covered later on), but can also be downloaded separately from the GitHub repository here.
If you plan on collecting a dataset for your own application, make sure to introduce enough variation into your data so that your final model will generalize well to different unseen scenarios. You'll also want to make sure to collect enough data for each audio class; a good starting point is 5-10 minutes per class, but it will depend on the audio class and the quality of the data collected; for example, if your data is inherently noisy (i.e. containing a lot of non-salient information) more data may be required to learn the acoustic activity of interest.
Custom Features with Edge ImpulseThe project accompanying this guide has been developed and optimized to use a feature type that is not built-in to Edge Impulse Studio, so before we can move ahead you'll need to set up your own custom processing block server for the feature extraction step.
Custom processing blocks are an Edge Impulse feature that lets users plug in their own feature extraction blocks in a generic way via an HTTP interface. This functionality can be used to add support for additional feature types, allow more advanced feature configurability, and even allow for customized data visualizations inside Edge Impulse Studio. Here we use this functionality to add the LogMFE feature type to Edge Impulse Studio.
If you'd prefer to skip these extra steps, you can try using the built-in MFCC or MFE feature blocks instead; however, your end application performance may differ significantly from the results published here.
Log Mel-Frequency Energy FeaturesFor this project we use the logarithm of the Mel-frequency energy (LogMFE) - a feature set that is widely used in machine learning for audio tasks, especially in cases where speech content is not the primary interest. A visualization of the two feature types for one of the dataset samples is shown in the figure below; the figure illustrates the LogMFE feature's relatively increased sensitivity to the vacuum cleaner activity; in particular, the sustained tonal content (i.e. the horizontal lines in the plot) produced by the vacuum is more easily distinguishable in the LogMFE spectrogram compared to the MFCC features.
Besides the suggestive visual evidence, the LogMFE developed neural network displayed improved performance over the MFCC and MFE variants of this project - at least for the particular dataset and configuration parameters explored - hence why LogMFE was selected.
LogMFE ComputationThe pseudo-code below summarizes the LogMFE feature extraction process.
# x <- segment of input time series signal (one 'frame')
# w <- frequency analysis window
# H_mel <- the Mel filterbank matrix
# Window multiply and apply Real Fast Fourier Transform (RFFT)
X = rfft(x * w)
# Compute normalized power spectrum
X_pow = 1 / N_FFT * abs(X)^2
# Apply filterbank to get Mel-frequency energy bins
X_mfe = X_pow x H_mel
# Apply Log function to get the final LogMFE
X_logmfe = log(X_mfe)
Setting Up the LogMFE Feature BlockIn order to generate the LogMFE features for Edge Impulse, an HTTP server must be set up which accepts the raw audio data and sends back the processed data. This server URL can then be inserted into our Impulse as we will cover in the next section.
1. The code that implements this part of the project is included in the GitHub repository under the custom-processing-blocks/logmfe folder, so if you haven't downloaded the source code already, clone the repository code using the command below
git clone https://github.com/MicrochipTech/ml-same54-cult-wm8904-edgeimpulse-sed-demo
2. Follow Section 1 of the Edge Impulse custom processing block tutorial using the custom-processing-blocks/logmfe code from the repository in place of the example code referenced in the guide. The Edge Impulse guide will cover the steps needed to bring up the HTTP server and expose access to it with a public URL.
Impulse CreationWith the LogMFE feature server set up, we can now define and train our Impulse.
1. Log in to your Edge Impulse account and clone the vacuum-recognition-demo project. This may take a few minutes.
2. If you haven't already, start your LogMFE custom processing block HTTP server and take note of the generated URL.
3. Once the vacuum-recognition-demo project has finished copying, navigate to the Create Impulse tab to set the overall Impulse configuration:
4. Click the edit button on the LogMFE audio block (shown as a pencil icon). In the resulting pop-up dialog, enter the public URL for your HTTP server that was generated previously, then click Confirm URL.
5. In the Time series data block, the window size should be configured to 2048 ms and the window increase to 512 ms. Note that the power of 2 window values are a result of choosing parameters that line up with the LogMFE feature extraction parameters (defined in the next step) to avoid any truncation or padding of the input window.
Window size affects how much context is given to the classifier per input; choosing too small a window can make it difficult for the algorithm to differentiate sounds well whereas too large a window will incur a large latency penalty.
Window increase determines the amount of overlap between classifier inputs; overlapping inputs can serve to both augment the available data and to build some time invariance into the learning process; although, there is little benefit to going beyond 75% overlap.
6. The final impulse configuration should match the figure below. Click Save Impulse to save updates.
6. Navigate to the LogMFE tab to set up the LogMFE feature configuration. Note if you choose to use the built-in MFCC or MFE features instead you can use a similar configuration as shown here, but there are additional parameters you may need to choose that won't be covered in this guide. Configure the LogMFE parameters according to the image below, then switch to the Generate Features tab and click the "Generate Features" button.
These parameters were chosen to minimize the RAM and processing cost of the feature extraction process while still maintaining good model performance; they may not work well for all applications.
A Frame Length of 20-25ms is common for general audio classification tasks, but since temporal resolution is not so important for this application it's preferable to use a frame length that matches the FFT Length of 32ms (512 samples @ 16kHz) for computational efficiency.
7. Navigate to the NN Classifier tab and configure the neural network as shown in the image below. Click Start Training and let the training run to completion.
The use of convolutional layers helps keep the number of parameters in the network small due to the re-use of parameters. The use of 1-D convolutions (convolution over time) also help to minimize the RAM space and processing requirements of the model.
8. Navigate to the Model testing tab to check neural network performance. Click the Classify All button to evaluate model performance on the test data set. The below figure shows the result for the vacuum cleaner test dataset.
At this point the model is trained and tested, and we can move on to deploying to our hardware.
Deploying Your ImpulseFollow the steps below to deploy your Impulse and integrate it into your existing MPLAB X project.
Use the MPLAB X project that accompanies this guide as a starting point for your own project - this will save you the trouble of doing the hardware and project configuration yourself.
1. Switch to the Deployment tab in Edge Impulse Studio and select the C/C++ deployment option.
2. Click the Build button that appears at the bottom of the page, leaving the default options selected.
3. Unzip the contents of the edge impulse ZIP file into your MPLAB X project’s src/ folder so that they overwrite the original Edge Impulse SDK files.
4. Rename all the.cc files from the Edge Impulse library to have a CPP suffix. You can do this in one shot with the following commands:
On Windows: ren *.cc *.cpp
On Mac®/Linux: find . -name "*.cc" -exec sh -c 'mv "$1" "${1%.cc}.cpp"' _ {} \;
Adding LogMFE to the Edge Impulse Inferencing SDKAt this point, the Edge Impulse Inferencing SDK should be fully integrated into your project. However, we still need to add support for the custom LogMFE audio feature to the deployed SDK. Luckily, we can implement this with minimal modifications by directly modifying the MFE feature code as detailed in the steps below.
If you have doubts about any of the steps below, take a look at the firmware source code accompanying this post where these changes have already been implemented.
1. Using the tool of your choice, generate a square-root Hann window that matches the Frame Length parameter from the LogMFE block. This is the windowing function that will be applied to your input signal before the Fourier transform. The square-root Hann window (AKA Hanning window), a common window for audio applications, is used for this project. Note that it's possible to use other window types, but the window must match what was used in the model development step. Below is a code snippet to generate the window using Python and the NumPy library.
import numpy as np
# Sampling frequency
Fs = 16000
# Frame length (must match parameter from Log MFE block)
Frame_length = 0.032
L = int(Fs * Frame_length)
# Generate the window
w = np.sqrt(np.hanning(L+1))[1:]
# Print the window coefficients to terminal
print(w)
2. Open src/edge-impulse-sdk/dsp/speechpy/feature.hpp and define a new array named window of type float at the beginning of the speechpy namespace; initialize it with the coefficients of the window generated in the previous step.
3. Open src/edge-impulse-sdk/dsp/speechpy/feature.hpp and locate the mfe() function. Inside the for loop, apply the window multiply after the call to signal->get_data() and before the call to processing::power_spectrum() as shown below.
4. Open src/edge-impulse-sdk/dsp/speechpy/feature.hpp and locate the mfe() function. At the end of the mfe() function before the return statement, apply a logarithm to the MFE output features as shown in the figure below.
5. Open src/edge-impulse-sdk/classifier/ei_run_classifier.h and locate the calc_cepstral_mean_and_var_normalization_mfe() function. Comment out the line calling cmvnw() and add a line calling the numpy::normalize() function as shown in the figure below. This will disable the mean subtraction step, while still applying the minmax normalization.
6. Open src/model-parameters/model_metadata.h.
7. Near the bottom of the file you'll find duplicate definitions for ei_dsp_config_mfe_t - delete the definition that does not include the win_size parameter.
8. Also near the bottom of model_metadata.h, find the instantiation of ei_dsp_config_mfe_t and add a comma and a 0 to the end of the initializer as shown in the figure below; this will set the win_size to 0.
Okay, the LogMFE feature should now be integrated into your source code and you should be ready to compile. Go ahead and click the Make and Program Device button in the toolbar to compile and flash your firmware to the SAME54 MCU.
Final RemarksThat's it! You should now have a basic understanding of developing a sound recognition application with Edge Impulse and Microchip hardware.
For more details about integrating your Impulse with an existing MPLAB X project, check out our "Integrating the Edge Impulse Inferencing SDK" article.
To learn more about Edge Impulse Studio, including tutorials for other machine learning applications, go to the Edge Impulse Docs.
Comments