TinyML Wake-Word Detection on Raspberry Pi Pico
Demo
Overview
Before You Begin
Hardware Setup
Software Setup
Build the Application
Code Deep Dive
Conclusion

Published March 17, 2021 © Apache-2.0

Pico Wake Word

The TinyML Wake Word demo on the Raspberry Pi Pico. A less than $10 wake word!

IntermediateFull instructions provided30 minutes17,391

Things used in this project

Hardware components

Raspberry Pi Pico

Adafruit Electret Microphone Amplifier - MAX4466 with Adjustable Gain

Software apps and online services

TensorFlow

Story

TinyML Wake-Word Detection on Raspberry Pi Pico

This application implements the wake word example from Tensorflow Lite for Microcontrollers on the Raspberry Pi Pico.

The wake word example shows how to run a 20 kB neural network that can detect 2 keywords, "yes" and "no". More information about this example is available on the Tensorflow Lite Micro examples folder.

We use as input an electret microphone to detect the words "yes" or "no" and turn the on-device LED on and off in response.

Demo

Because of my slight northern English accent, I struggle to get the demo consistently working with my own voice (even the high quality OS X version). Here is a clip of me placing my microphone next to my speaker for somebody else to say "yes".

Demonstration of Wake Word on Raspberry Pi Pico

Overview

The micro_speech app for the Raspberry Pi Pico is an adaptation taken from the "Wake-Word" example on Tensorflow Lite for Microcontrollers. Pete Warden's and Daniel Situnayake's TinyMLbook gives an in-depth look into how this model works and how to train your own. This repository ports the example to work on the Pico.

The application works by listening to the microphone and processing the data before sending it the model to be analyzed. The application takes advantage of Pico's ADC and DMA to listen for samples, saving the CPU to perform the complex analysis.

The Pico does not come with an onboard microphone. For this application, we use the Adafruit Electret Microphone Amplifier breakout.

Before You Begin

We will now go through the setup of the project. This section contains two sub-sections, hardware setup and software setup.

Hardware Setup

Assembly

Solder headers onto your Raspberry Pi Pico
Solder headers onto your Adafruit Electret Microphone

Wiring

The electret microphone breakout is an analog input, meaning we can connect it to one of the ADC pins on the Raspberry Pi Pico. Make the following connections:

(in the format Microphone -> Pico)

Out -> ADC0 - Pin31
GND -> Any ground pin
VCC -> 3V3(OUT) - Pin36

Fritzing diagram of the setup.

Software Setup

The final step before using this application is to set up the software stack (CMake and compilers). The easiest way to do this is to follow the steps on the Raspberry Pi Pico SDK repository. Once done you can test your toolchain setup by running some of the examples found in the Pico examples repository.

Alternatively, you can use the provided Dockerfile if you would prefer to build your application in an isolated environment.

You can now clone the application repository:

git clone https://github.com/henriwoodcock/pico-wake-word.git

Build the Application

With the Pico-SDK setup on your machine, building the application is the same as building any other Pico application.

Change directory into this repository

Change directory into this repository

cd pico-wake-word

Make a build directory

mkdir build

Generate the Makefiles

cd build
cmake ..

Finally run the Makefile

make -j8

Once done, your micro_speech.uf2 file is located in build/micro_speech.

Code Deep Dive

The Raspberry Pi Pico is the latest product by the Raspberry Pi foundation. It is a low-cost microcontroller board (<£4) which features the new RP2040 chip by Raspberry Pi.

The RP2040 is built on a "high-clocked" dual-core Cortex M0+ processor, making it a "remarkably" good platform for endpoint AI. Find out more about the Pico and the RP2040 from James Adams, COO, Raspberry Pi on arm.comhere.

We will now go through the changes made to the Tensorflow micro_speech example to allow it to work with the Pico. The Tensorflow team have already done a port of Tensorflow Lite Micro for the Pico.

The two files that you need to be edit for the micro_speech application are the audio_provider.cc and the command_responder.cc. The audio_provider.cc connects a device's microphone hardware to the application, and the command_responder.cc takes the model output and produces an output to react to the spoken word.

But before that, let's introduce a couple of concepts required to make this to work.

ADC

ADC stands for analog-to-digital converter. It is a system that converts an analog signal, such as the one from our microphone, and converts it to a digital signal. The Pico has four ADCs on the board (including the built-in temperature sensor), all of which can output a 12bit integer. Luckily for us, this is more than enough for our microphone.

DMA

DMA stands for Direct Memory Access. A DMA controller can write and read data directly into the main system memory without requiring the CPU.

This leaves processors free to attend to other tasks, or enter low-power sleep states, RP2040 Datasheet.

Once the DMA transfer is complete, it can create an interrupt request, allowing the CPU to process the transferred data.

RingBuffer

A ring buffer (AKA circular buffer) is a fixed-size array that acts as if the memory is continuous or a ring, meaning as the array is updated and becomes filled, data wraps around back to the start. When using a ring buffer, it is important to save the index. This way, you can always access the latest data.

We are now ready to dive into the code.

AudioProvider

The audio_provider.cc works by continuously collecting data from the microphone and saving the data into an array. This means collecting data while other parts of the code are running, so the current audio can be analysed while new audio is collected. We need to implement two functions for this to work with the rest of the application. These are `GetAudioSamples()` and LatestAudioTimestamp().

How this works:

DMA to collect data from the microphone
Interrupt function to clean up the collected data and put it into a ring buffer

To do this, we first make a function, setup(), this initializes the ADC, the DMA and an interrupt. For more examples on the Pico's DMA and ADC, please take a look at the Pico-Examples repository.

Let's break down this function. The first step is setting up the ADC:

#define CLOCK_DIV 3000
adc_gpio_init(26 + CAPTURE_CHANNEL);
adc_init();
adc_select_input(CAPTURE_CHANNEL);
adc_fifo_setup(
  true,    // Write each completed conversion to the sample FIFO
  true,    // Enable DMA data request (DREQ)
  1,       // DREQ (and IRQ) asserted when at least 1 sample present
  false,   // We won't see the ERR bit because of 8 bit reads; disable.
  false     // Shift each sample to 8 bits when pushing to FIFO
  );
// set sample rate
adc_set_clkdiv(CLOCK_DIV);

The final line sets the rate at which data is collected from the ADC into the FIFO. This is based on the 48MHz ADC clock. Because the `micro_speech` mode expects 16kHz input audio, we must be sampling at that rate.

For example, a CLOCK_DIV of 3000 means a sample is taken every (1 + 3000) cycles, which gives a sample rate of 48000000Hz / 3000 = 16000Hz or 16kHz.

With the ADC setup, we can now set up the DMA to transfer the data from the ADC FIFO into an array. To do this, we claim a DMA channel and set the DMA to read a set amount called NSAMP from the ADC FIFO before completing it. We do not set a write location at this step as we do this during the callback.

#define NSAMP 1024
uint dma_chan = dma_claim_unused_channel(true);
cfg = dma_channel_get_default_config(dma_chan);
// Reading from constant address, writing to incrementing byte addresses
channel_config_set_transfer_data_size(&cfg, DMA_SIZE_16);
channel_config_set_read_increment(&cfg, false);
channel_config_set_write_increment(&cfg, true);
// Pace transfers based on availability of ADC samples
channel_config_set_dreq(&cfg, DREQ_ADC);
dma_channel_configure(dma_chan, &cfg,
                      NULL,    // dst
                      &adc_hw->fifo,  // src
                      NSAMP,          // transfer count
                      false            // do no start immediately
                      );

The last step is to create the interrupt callback and set up the callback to trigger when the DMA is complete. To do this, we first create the callback function called CaptureSamples(). A lot of this function is copied from the Tensorflow micro_speech examples, so it is helpful to get a brief understanding before reading this function. We start by defining a large array (the ring buffer), a smaller array (the buffer the DMA will write to) and a timestamp (used to calculate the index):

uint16_t g_audio_sample_buffer[NSAMP]; // the dma write location
constexpr int kAudioCaptureBufferSize = NSAMP * 16;
int16_t g_audio_capture_buffer[kAudioCaptureBufferSize]; // the ring buffer
volatile int32_t g_latest_audio_timestamp = 0;

We can now define the interrupt function. When the DMA is complete, we want to calculate the index of where to store the new data in the ring buffer. This is done by converting the current timestamp into an index and using memcpy to transfer the bytes to that location.

void CaptureSamples() {
// data processing
const int number_of_samples = NSAMP;
// Calculate what timestamp the last audio sample represents
const int32_t time_in_ms = g_latest_audio_timestamp + (number_of_samples / (kAudioSampleFrequency / 1000));
// Determine the index, in the history of all samples, of the last sample
const int32_t start_sample_offset = g_latest_audio_timestamp * (kAudioSampleFrequency / 1000);
// Determine the index of this sample in our ring buffer
const int capture_index = start_sample_offset % kAudioCaptureBufferSize;
// Read the data to the correct place in our buffer
memcpy(g_audio_capture_buffer + capture_index, (void *)g_audio_sample_buffer, sizeof(int16_t)*number_of_samples);
// Clear the interrupt request.
dma_hw->ints0 = 1u << dma_chan;
// Give the channel a new wave table entry to read from, and re-trigger it
dma_channel_set_write_addr(dma_chan, g_audio_sample_buffer, true);
g_latest_audio_timestamp = time_in_ms;
}

We now add the interrupt callback onto the dma channel.

dma_channel_set_irq0_enabled(dma_chan, true);
// Configure the processor to run dma_handler() when DMA IRQ 0 is asserted
irq_set_exclusive_handler(DMA_IRQ_0, CaptureSamples);
irq_set_enabled(DMA_IRQ_0, true);

Finally, with all this complete, we can now start the ADC and initialize the DMA by manually calling the CaptureSamples() function.

adc_run(true); //start running the adc
CaptureSamples();

CommandResponder

The command_responder.cc implements one function, RespondToCommand(). In this implementation, we will turn the onboard LED on when "yes" is said and turn the onboard LED off when "no" is said. The first part to this is initializing the onboard LED:

// led settings
static bool is_initialized = false;
const uint LED_PIN = 25;
// if not initialized, setup
if(!is_initialized) {
  gpio_init(LED_PIN);
  gpio_set_dir(LED_PIN, GPIO_OUT);
  is_initialized = true;
}

With the onboard LED initialized, we can now use the input variables to handle the output. This has two steps: the first step is to log the output into the error_reporter:

if (is_new_command) {
  TF_LITE_REPORT_ERROR(error_reporter, "Heard %s (%d) @%dms", found_command,
                       score, current_time);
}

Step two is to turn the LED on or off based on the heard command:

if (is_new_command) {
  if (found_command == "yes"){
  //turn led on
  gpio_put(LED_PIN, 1); 
  }
  else {
  //turn led off
  gpio_put(LED_PIN, 0); 
  }
}

Putting this all together, we get the following function:

void RespondToCommand(tflite::ErrorReporter* error_reporter, int32_t current_time, const char* found_command, uint8_t score, bool is_new_command) {
  // led settings
  static bool is_initialized = false;
  const uint LED_PIN = 25;
  // if not initialized, setup
  if(!is_initialized) {
    gpio_init(LED_PIN);
    gpio_set_dir(LED_PIN, GPIO_OUT);
    is_initialized = true;
  }
  if (is_new_command) {
  TF_LITE_REPORT_ERROR(error_reporter, "Heard %s (%d) @%dms", found_command,
  score, current_time);
    if (found_command == "yes"){
    //turn led on
    gpio_put(LED_PIN, 1);
    }
    else {
    //turn led off
    gpio_put(LED_PIN, 0);
    }
  }
}

Conclusion

This project has shown how you can implement the "Wake Word" application on your Raspberry Pi Pico. The application is not perfect, and it may take some attempts for it to recognise your voice. The full code can be found on GitHub. The GitHub repository has steps on how to install the app and make changes to the app.

If you have any improvements, feel free to make a pull request!

Code

#include "audio_provider.h"
#include "micro_features/micro_model_settings.h"

#include <string.h>
#include "pico/stdlib.h"
#include "hardware/adc.h"
#include "hardware/dma.h"
#include "hardware/irq.h"

#include <limits>

// set this to determine sample rate
// 0     = 500,000 Hz
// 960   = 50,000 Hz
// 9600  = 5,000 Hz
// -> 3000 = 16,000 Hz
#define CLOCK_DIV 3000 //16Khz

// Channel 0 is GPIO26
#define ADC_PIN 26
#define CAPTURE_CHANNEL 0

#define NSAMP 1024
#define BUFFER_SIZE 16000

namespace {
// dma settings
dma_channel_config cfg;
uint dma_chan;

uint16_t g_audio_sample_buffer[NSAMP];
// tflite micro settings
bool g_is_audio_initialized = false;
// An internal buffer able to fit 16x our sample size
constexpr int kAudioCaptureBufferSize = NSAMP * 16;
int16_t g_audio_capture_buffer[kAudioCaptureBufferSize];
// A buffer that holds our output
int16_t g_audio_output_buffer[kMaxAudioSampleSize];
// Mark as volatile so we can check in a while loop to see if
// any samples have arrived yet.
volatile int32_t g_latest_audio_timestamp = 0;

} // namespace

//this next function is the dma interupt
void CaptureSamples() {
  // data processing
  const int number_of_samples = NSAMP;
  // Calculate what timestamp the last audio sample represents
  const int32_t time_in_ms = g_latest_audio_timestamp + (number_of_samples / (kAudioSampleFrequency / 1000));
  // Determine the index, in the history of all samples, of the last sample
  const int32_t start_sample_offset = g_latest_audio_timestamp * (kAudioSampleFrequency / 1000);
  // Determine the index of this sample in our ring buffer
  const int capture_index = start_sample_offset % kAudioCaptureBufferSize;
  // Read the data to the correct place in our buffer
  memcpy(g_audio_capture_buffer + capture_index, (void *)g_audio_sample_buffer, sizeof(int16_t)*number_of_samples);

  // Clear the interrupt request.
  dma_hw->ints0 = 1u << dma_chan;
  // Give the channel a new wave table entry to read from, and re-trigger it
  dma_channel_set_write_addr(dma_chan, g_audio_sample_buffer, true);

  g_latest_audio_timestamp = time_in_ms;
}

void setup() {
  adc_gpio_init(ADC_PIN + CAPTURE_CHANNEL);

  adc_init();
  adc_select_input(CAPTURE_CHANNEL);
  adc_fifo_setup(
		 true,    // Write each completed conversion to the sample FIFO
		 true,    // Enable DMA data request (DREQ)
		 1,       // DREQ (and IRQ) asserted when at least 1 sample present
		 false,   // We won't see the ERR bit because of 8 bit reads; disable.
		 false     // Shift each sample to 8 bits when pushing to FIFO
		 );

  // set sample rate
  adc_set_clkdiv(CLOCK_DIV);

  sleep_ms(1000);
  // Set up the DMA to start transferring data as soon as it appears in FIFO
  uint dma_chan = dma_claim_unused_channel(true);
  cfg = dma_channel_get_default_config(dma_chan);

  // Reading from constant address, writing to incrementing byte addresses
  channel_config_set_transfer_data_size(&cfg, DMA_SIZE_16);
  channel_config_set_read_increment(&cfg, false);
  channel_config_set_write_increment(&cfg, true);

  // Pace transfers based on availability of ADC samples
  channel_config_set_dreq(&cfg, DREQ_ADC);

  dma_channel_configure(dma_chan, &cfg,
			NULL,    // dst
			&adc_hw->fifo,  // src
			NSAMP,          // transfer count
			false            // start immediately
	);

  // Tell the DMA to raise IRQ line 0 when the channel finishes a block
  dma_channel_set_irq0_enabled(dma_chan, true);
  // Configure the processor to run dma_handler() when DMA IRQ 0 is asserted
  irq_set_exclusive_handler(DMA_IRQ_0, CaptureSamples);
  irq_set_enabled(DMA_IRQ_0, true);

  adc_run(true); //start running the adc
}

TfLiteStatus InitAudioRecording(tflite::ErrorReporter* error_reporter) {
  // Hook up the callback that will be called with each sample
  setup();
  // Manually call the handler once, to trigger the first transfer
  CaptureSamples();
  // let first samples roll in
  sleep_ms(1000);
  // Block until we have our first audio sample
  while (!g_latest_audio_timestamp) {
  }

  return kTfLiteOk;
}

TfLiteStatus GetAudioSamples(tflite::ErrorReporter* error_reporter,
                             int start_ms, int duration_ms,
                             int* audio_samples_size, int16_t** audio_samples) {
  // Set everything up to start receiving audio
  if (!g_is_audio_initialized) {
    TfLiteStatus init_status = InitAudioRecording(error_reporter);
    if (init_status != kTfLiteOk) {
      return init_status;
    }
    g_is_audio_initialized = true;
  }
  // This next part should only be called when the main thread notices that the
  // latest audio sample data timestamp has changed, so that there's new data
  // in the capture ring buffer. The ring buffer will eventually wrap around and
  // overwrite the data, but the assumption is that the main thread is checking
  // often enough and the buffer is large enough that this call will be made
  // before that happens.

  const int16_t kAdcSampleDC = 2048;
  const int16_t kAdcSampleGain = 20;

  // Determine the index, in the history of all samples, of the first
  // sample we want
  //printf("getting audio samples\n");
  const int start_offset = start_ms * (kAudioSampleFrequency / 1000);
  // Determine how many samples we want in total
  const int duration_sample_count = duration_ms * (kAudioSampleFrequency / 1000);
  for (int i = 0; i < duration_sample_count; ++i) {
    // For each sample, transform its index in the history of all samples into
    // its index in g_audio_capture_buffer
    const int capture_index = (start_offset + i) % kAudioCaptureBufferSize;
    const int32_t capture_value = g_audio_capture_buffer[capture_index];
    int32_t output_value = capture_value - kAdcSampleDC;
    //
    output_value *= kAdcSampleGain;
    //printf("%d, \n", output_value);
    if (output_value < std::numeric_limits<int16_t>::min()) {
      output_value = std::numeric_limits<int16_t>::min();
    }
    if (output_value > std::numeric_limits<int16_t>::max()) {
      output_value = std::numeric_limits<int16_t>::max();
    }
    // Write the sample to the output buffer
    g_audio_output_buffer[i] = output_value;
    //printf("%d\n", output_value);
    //g_audio_output_buffer[i] = capture_value;
  }
  // Set pointers to provide access to the audio
  *audio_samples_size = kMaxAudioSampleSize;
  *audio_samples = g_audio_output_buffer;

  return kTfLiteOk;
}

int32_t LatestAudioTimestamp() { return g_latest_audio_timestamp; }

#include "command_responder.h"

#include <stdio.h>
#include "pico/stdlib.h"

#define LED_PIN 25

// The default implementation writes out the name of the recognized command
// to the error console. Real applications will want to take some custom
// action instead, and should implement their own versions of this function.
void RespondToCommand(tflite::ErrorReporter* error_reporter,
                      int32_t current_time, const char* found_command,
                      uint8_t score, bool is_new_command) {

  // led settings
  static bool is_initialized = false;
  //const uint LED_PIN = 25;
  // if not initialized, setup
  if(!is_initialized) {
    gpio_init(LED_PIN);
    gpio_set_dir(LED_PIN, GPIO_OUT);
    is_initialized = true;
  }

  if (is_new_command) {
    TF_LITE_REPORT_ERROR(error_reporter, "Heard %s (%d) @%dms", found_command,
                         score, current_time);

    if (found_command == "yes"){
      //turn led on
      gpio_put(LED_PIN, 1);
    }
    else if (found_command == "no") {
      //turn led off
      gpio_put(LED_PIN, 0);
    }
  }
}

Credits

Henri

1 project • 9 followers

Technical AI Evangelist @ Arm. Data Scientist converted to the embedded world. TinyML.

Pico Wake Word

Things used in this project

Hardware components

Software apps and online services

Story

TinyML Wake-Word Detection on Raspberry Pi Pico

Demo

Overview

Before You Begin

Hardware Setup

Software Setup

Build the Application

Code Deep Dive

Conclusion

Schematics

Fritzing Schematic

Code

audio_provider.cc

command_responder.cc

Credits

Henri

Comments

Embed the widget on your own site

Pico Wake Word

Pico Wake Word

Things used in this project

Hardware components

Software apps and online services

Story

TinyML Wake-Word Detection on Raspberry Pi Pico

Demo

Overview

Before You Begin

Hardware Setup

Software Setup

Build the Application

Code Deep Dive

Conclusion

Schematics

Fritzing Schematic

Code

audio_provider.cc

command_responder.cc

Credits

Henri

Comments

Related channels and tags