Team member:
Yuting Chen (yc119)
Bangju Wang (bw27)
Wake Word Detection technique has been widely used on various kinds of voice assistants such as Apple Siri, Microsoft Cortana, Google Assistant. In order to get a better understanding of this technique, we built this project to develop a wake word detection application in resource-limited IoT devices, in our project, Arduino.
The model takes audio data as an input. As we’ll see, this requires heavy processing before it can be fed into a model. The model is a classifier, outputting class probabilities. We’ll have to parse and make sense of this output.
The words including ‘yes’ and ‘no’ have been recognized using this model and other words could be detected as ‘unknown’. To be specified, after the word ‘yes’ has been detected by the board, the light will turn on for 3 seconds. Meanwhile, the recognition results will be shown on the screen. After 'no' has been detected, the light will not change while the recognition results will be shown on the screen. Besides, when other words are detected, the light still will not change, and 'unknown' will be shown on the screen.
We will use the existing dataset, train our model with Tensorflow, convert it into the TensorFlow Lite model and deploy the model into Arduino. Then we shall observe the LED light and look at Arduino IDE messages to get output information from the Arduino.
As shown in the diagram, the program contains following components:
Main loopLike the “hello world” example, our application runs in a continuous loop. All ofthe subsequent processes are contained within it, and they execute continually, asfast as the microcontroller can run them, which is multiple times per second.
Audio providerThis component captures raw audio data from the microphone. Since the meth‐ods for capturing audio vary from device to device, this component can be over‐ridden and customized.
Feature providerThe feature provider converts raw audio data into the spectrogram format thatour model requires. It does so on a rolling basis as part of the main loop, providing the interpreter with a sequence of overlapping one second windows.
TF Lite interpreterThe interpreter runs the TensorFlow Lite model, transforming the input spectrogram into a set of probabilities.
ModelThe model is included as a data array and run by the interpreter. The array islocated in tiny_conv_micro_features_model_data.cc.
Recognize commandsSince inference is run multiple times per second, the recognize commands class aggregates the results and determines if, on average, a known word was heard.
Command responderIf a command was heard, the command responder uses the device’s output capa‐bilities to let the user know. Depending on the device, this could mean flashing an LED, or showing data on an LCD display. It can be overriden for different device types.









Comments