I received the device on 29th April, I have around 3 weeks for develop, I couldn't complete the project hence, I'll update what I have done.
In real life, the "Ok, Google" is one of the excellent examples of a two-stage TinyML implementation. In the first stage, the smartphone continuously detects the wake-up word, "Ok, Google", without the network connection. Once the user detects the wake-up word, the Google Assistant will be turned on and run the speech recognition to detect the words of speech. The Google Assistant acts as the second stage classifier, which is more accurate and complex in the algorithm. This arrangement saves power consumption because the smartphone does not necessarily recognize the speech all time but a simpler algorithm for the wake-up-word.
Models TrainingTwo separate models are required for the first and second stages. The first stage implements a wake word detector. The phrase "Marvin" is chosen as the wake-up word. For the second stage, four direction commands are chosen: " Up " and "Down". "Right" and "Left." All of the datasets are obtained from the speech dataset provided by Google.
Edge Impulse is used as the model's training platform. It simplifies the training process and provides deployment techniques on local devices, including Arduino. In this case, both models are trained separately with different projects. For the first stage wake-up word detection model, besides "Marvin" class, there are "Noise" and "Unknown" classes, which represent background noise and unknown commands, respectively. The "Unknown" dataset is built up by different kinds of commands such as "Yes", "No", "Stop", etc. to differentiate the "Marvin" and the other commands. The second stage direction commands model is also trained with the background noise. Both model architectures use the default architectures provided by Edge Impulse, shown in Table 1.
Once the models are trained, we can create an Arduino library to implement them into Arduino GIGA R1 WiFi, which is compatible with Arduino IDE.
Models ImplementationSince we have to upload the models into separate cores, we can choose the core we want to upload as shown in Figure 1. We can see that we may choose M7 core or M4 core as the implementation platform.
The GIGA Display Shield has an embedded MEMS microphone (MP34DT06JTR) that when combined with the visual element of the GIGA Display Screen can be used in a number of ways.
Microphone InterfacingFigure 2 shows the digital MEMS microphone on the GIGA Display Shield. The PDM.read() function is applied to connect to the digital microphone for audio capturing. The audio capturing is located in pdm_data_ready_inference_callback() function. The original code has been modified to work with the digital MEMS microphone rather than an analog microphone. The PDM library allows us to efficiently capture digital audio data.
Unlike the analog approach that required a loop with repeated analogRead() calls, the PDM interface handles the audio capture more efficiently. The PDM library works in 16-bit mode by default, providing higher resolution audio samples suitable for keyword detection applications.
Two-stage ClassificationWe needed to develop the core-to-core communication mechanism to satisfy the two-stage classification. The remote procedure call (RPC) mechanism is introduced. The RPC Protocol is a request-response protocol. The client initiates an RPC by sending a request message to a known remote server, instructing it to run a specific operation with specified parameters. The client receives a response from the distant server, and the program continues its work. Unless the client submits an asynchronous request to the server, the client is halted while the server is processing the call (it waits until the server has completed processing before continuing execution). The M4 core is in charge of wake-up word detection, while the M7 core is in charge of the direction commands recognition. The RPC mechanism is only applicable to the M4 core to pass the data to the M7 core. An external pin is required to make it work for the M7 core passing variable to M4 core. Figure 3 shows the core-to-core communication mechanism for this two-stage classification.
Referring to Figure 3, M4 core will run to recognize the wake-up word first and keep sending the variable "Pass" to M7 by RPC. The variable " Pass " will be set once the wake-up word "Marvin" is detected, the variable "Pass" will be set. M7 core will be started to recognize the direction commands once it receives the variable "Pass" is set. Like M4 core, when one direction commands are detected, M7 will set the variable "Turn" and transfer the data to the M4 core by an external pin. Variable "Turn" will be always written on the digital output pin, D5. For the external pin, M7 core will write "Turn" onto D5 while M4 core will read "Turn" through D4. With the aid of a 100kΩ resistor, we can control the current flow while the variable will store in the voltage instead of the current so that the value of current does not affect the transmission of data.
From Figure 4, we can see that the digital pins D4 and D5 are connected to each other with a 100kΩ resistor to the data transmission. The microphone is connected to analog input pin A6 for audio capturing. For those LEDs, they are just the stages and commands indicator to indicate the current stage and the commands being called out. In the first stage, there will be no LEDs being light up. Once "Marvin" is detected, LED1 will light up, and the device will wait for the direction commands. Next, when one of the commands is recognized, their corresponding LED will also light up and the system will go back to the first stage. Besides that, the RPC mechanism is not only used to pass the variable from the M4 core to the M7 core. All of the results will also be printed on the serial monitor in the Arduino IDE. However, the M4 core cannot directly print any information to the serial monitor. Thus, with the aid of RPC, the M7 core will help the M4 core display the result on the serial monitor.
Comments