We followed the instructions in Chapter 8 of the TinyML book to build a voice recognition model that records one second sound in circles and classifies the voice into different word labels. After implementing the model, we converted it with TFLite module and deployed it on Arduino Nano 33 BLE SENSE processor.
Data ProcessingFor the wake word detection task, we have a Speech Command Dataset with 65000 data points of 30 classes of words. Limited by the complexity of the LED output of the board, we have to preprocess the dataset to only include two words and 'unknown'. Here we tried to distinguish between 'yes' (green light) and 'no' (red light).
TrainingWe use Tensor Board to monitor the training process, which shows two graphs displaying accuracy and cross-entropy, allowing us to assess and improve the model. The model is quite streamlined for deploying on low power hardware, compromising of a Convolutional Layer, a Fully-connected Layer, and a Softmax output layer. After training, we can convert it to TFLite, and boost the hardware!
ResultThe result is quite satisfying. Most of the time, our model can accurately output the word we said. However, sometimes the model would output 'unknown' no matter what we have said. We guess it could be the result of the polling-like recording might split a single utterance into two different time periods.








Comments