This Toolkit Lets Users Easily Capture Audio Data for Building TinyML Models

See how this project incorporates a more streamlined way to accurately capture, process, and predict data using tinyML and a microphone.

5 years ago • Sensors / AI & Machine Learning / Internet of Things

What Is TinyML?

Machine learning is everywhere these days, and it keeps getting more efficient and powerful as the technology matures. TinyML is a phrase that describes running memory and processing-optimized models on devices that lack large amounts of RAM and clock cycles, such as microcontrollers. There has been a community growing around tinyML for the past few years where people attempt to cram increasingly large and complex models to make inferences and predictions about anything from sound to images and other such patterns of data. It does, however, still face some hurdles that prevent it from gaining popularity among hobbyists and industry.

Some Challenges

ML models need ample amounts of data with accompanying labels, which can be a tedious task. Adding the need for additional processing of signals and matrices just complicates things even more. This is why having effective, automated tools is so important to the realm of machine learning.

How the Toolkit Works

A toolkit is a collection of tools that a developer can use, hence the name. Over on GitHub, the startup IQT Labs released an Audio Sensor Toolkit that greatly streamlines the process of gathering and labeling data, training models, and deploying it to remote devices for over-the-air (OTA) streaming of alerts and sensor readings.

Buildings Datasets and Models

For the demonstration project, creator Luke Berndt was able to find an initial audio dataset from Microsoft called the Microsoft Scalable Noisy Speech Dataset, and from there they extracted just the files containing either sirens or no sirens. Everything was recorded at 16kHz, which determines the sample rate for the microphones when the model is eventually deployed. There was an additional dataset called the Sounds of New York City Dataset that also contained siren sounds. Generally in machine learning, the more data there is from which to train, the more accurate and flexible the model will be.

After the data was in place and labeled, it was then time to process it into features and train a model on it. Berndt opted to use Edge Impulse due to its powerful features and simplicity. The data first goes through a windowing processing block that splits up each sample into chucks, or "windows." Then, it enters a Mel Frequency Cepstral Coefficients (MFCC) block that takes each window and generates its features. You can read more about how it works here. There is some additional processing that takes place before the MFCC algorithm is run to filter out and isolate particular frequencies.

Now that the features had been created, a model was trained on them. Initially, it would over-fit the training data and thus make it fail in testing, so a few Dropout layers were added to give some additional variation. Once 40 training cycles had elapsed, the model was ready to deploy.

Choosing a Board

Building devices that can run ML models while being powered from a battery is tough due to increased power consumption. The Arduino Nano 33 BLE Sense is a good candidate, except that is lacks a robust ecosystem to expand its functionality. In its place, Berndt opted to use an Adafruit Feather nRF52840 Sense due to its processing power, abundance of storage/RAM, and the onboard microphone.

Transferring Data

What good is a board running a tinyML model if it does nothing with its data or inferences? Because this project is meant to be used outside, it has an AdaLogger FeatherWing to both log data to a micro SD card and run an RTC for ensuring the current time is always accurate. There is also a LoRa Radio FeatherWing that lets the device send information across long distances wirelessly.

Testing It All Out

The software running on the Feather captures 16,000 audio samples per second with the PDM subsystem, leaving the CPU free to do other tasks. Once the audio buffer is full, it's converted into PCM (pulse code modulation) and sent to both the SD card as a .wav file for storage and the model for inference. A single buffer might cause the device to miss events, so using a double buffer lets one fill up with new audio while the other is read and processed.

To test the system, the device was left outside for 24 hours, which generated 43,200 2-second samples. After adding the ground truth, the model was able to successfully predict sirens vs no sirens with around 97% accuracy.

This project is a great demonstration of what tinyML is capable of and how easy it can be to get started. For more information about this project, you can visit its Github repository here.

internet of things

embedded

environmental sensing

machine learning

Evan Rust

Embedded Software Engineer II @ Amazon's Project Kuiper. Contact me for product reviews or custom project requests.