Published May 11, 2025 © GPL3+

Recognizing Keywords with the Sparkfun MicroMod system

By combining a Teensy MicroMod board with an ML Carrier board, keywords can be extracted from speech and displayed

IntermediateFull instructions provided4 hours686

Recognizing Keywords with the Sparkfun MicroMod system

Things used in this project

Hardware components

SparkFun MicroMod Teensy Processor

SparkFun MicroMod Machine Learning Carrier Board

TFT LCD ILI9341 Driver

Software apps and online services

Edge Impulse Studio

Arduino IDE

Microsoft Visual Studio Code Extension for Arduino

Hand tools and fabrication machines

3D Printer (generic)

Story

The idea

With the release of Sparkfun's MicroMod ecosystem, assembling prototypes with swappable hardware has become easier than ever. To demonstrate the MicroMod Teensy Processor and Machine Learning Carrier boards, I wanted to use a microphone that can detect certain keywords and display them on a screen.

Setting up the hardware

The hardware for this project consists of just three components. First is the MicroMod Machine Learning Carrier Board which contains a 3-axis accelerometer, MicroSD card slot, 24-pin CMOS camera connector, ample GPIO breakout pins, and two microphones (PDM and I2S), although only the I2S microphone and pin headers will be used. Next is a MicroMod Teensy Processor board that contains the same powerful NXP iMXRT1062 chip as the Teensy 4.x lineup of development kits. And finally, the last component is a generic 2.8" SPI LCD with a resolution of 320x240 pixels. I selected the ILI9341-driven TFT due to its strong support from the Teensyduino Arduino IDE add-on.

I connected the LCD to the Carrier Board with a series of jumper wires than ran between various SPI and GPIO pins. For more detailed information on wiring, you can view the associated schematic. The Teensy Processor module got slotted into the M.2 connector and evenly seated to ensure good connectivity. Because there is a lack of sleek ways to keep the components from simply tipping over, I designed a pair of custom brackets that hold the TFT and Carrier Board in place while also having the ability to become joined together as a singular unit.

1 / 5

By default, the pulse-density modulation (PDM) microphone is enabled on the Carrier Board, but because the Teensy audio library doesn't support the pin it's connected to, I was forced to cut the EN1 jumper and solder the EN2 jumper closed for the I2S microphone.

1 / 2

Creating a dataset of keywords

Before any machine learning models could be created, I first had to gather some audio samples. Keyword spotting generally works by taking an audio waveform, transforming it into a spectrogram or some other set of features, and then running the resulting data through an untrained model. Thankfully, both Microsoft and Google have some very thorough datasets that cover many of the most commonly-needed keywords such as "yes", "no", various numbers, simple directions, and background noise for when nothing is being said. I uploaded around 30 minutes each of 1-second clips for the following words:

Yes
No
Up
Down
Left
Right

with an additional 20 minutes of "noise" samples for a total of 3 hours, 25 minutes, and 33 seconds of audio data in Edge Impulse's Uploader tool.

Training the machine learning model

My impulse for the Edge Impulse project had an initial time series data block that takes in a single audio axis and splits it into 1000ms windows.

This information is then fed into an Audio (Mel-filterbank energy or MFE) block which applies an FFT algorithm to each sample and creates a spectrogram from it.

Once the Keras neural network had been trained for 100 cycles at a learning rate of 0.005, I was able to further improve its accuracy using the EON Tuner utility that got its accuracy up to an impressive 92.3%.

1 / 2

You can view the public model and dataset here on Edge Impulse.

The microphone

Now that the model had been fully trained and downloaded as an Arduino library, it was time to figure out how to get the data from a person's voice into the model for inferencing. Thankfully, the Teensy Audio library is a extremely simple to integrate into a sketch, so I used it to take the audio from I2S microphone and send it to both the USB port and a Queue object for grabbing the incoming packets.

1 / 2

One problem is that the audio is sampled at a 44.1KHz rate from the microphone, whereas the dataset's samples are at 16KHz. This meant I had to create a simple downsampling algorithm that takes the nearest-neighbor from the raw audio for each of the 16, 000 points and averages the closest 3.

From here, the newly downsampled audio is passed every second to the Edge Impulse model via a signal object and results are received with an ei_impulse_result_t object.

Displaying results on the LCD screen

With the model now giving a result every second, I wanted to create an interesting way to present what it inferred. Once the result is ready, a for-loop goes through each label and prints its name. Additionally, a rectangle is drawn underneath the text that corresponds to that label, and its size is determined by how confident the model is. For instance, a value of 0.9 equals 180 pixels in width, whereas a value of just 0.05 gives a rectangle that's 10 pixels wide. Finally, the actual value is printed in a column on the right for more detailed information.

1 / 3

Going further

This simple project is merely a launch-point for something greater. Feel free to grab your own MicroMod boards and start experimenting by creating other new and exciting projects that incorporate machine learning.

Schematics

Code

MicroMod_Keyword_Spotter.ino

#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>
#include <ILI9341_t3.h>
#include <font_Arial.h>

// Export the model yourself from Edge Impulse and add it as an Arduino library
#include <micromod-keywords_inferencing.h>

AudioInputI2S            i2s1;
AudioRecordQueue         queue1;
AudioOutputUSB           usb1;
AudioConnection          patchCord4(i2s1, 0, queue1, 0);
AudioConnection          patchCord5(i2s1, 0, usb1, 0);
AudioConnection          patchCord6(i2s1, 0, usb1, 1);

#define TFT_WIDTH   320
#define TFT_HEIGHT  240

#define TFT_DC  1       // TX1 on carrier board
#define TFT_CS  4       // D0 on carrier board
#define TFT_RST 0       // RX1 on carrier board
#define TFT_BL  5       // D1 on carrier board

#define TFT_BACKGROUND_COLOR    ILI9341_WHITE
#define TFT_FOREGROUND_COLOR    ILI9341_BLUE
#define TFT_TEXT_COLOR          ILI9341_BLACK

static ILI9341_t3 tft(TFT_CS, TFT_DC, TFT_RST);

#define COLUMN_COUNT        16  
#define COLUMN_WIDTH        (TFT_WIDTH / COLUMN_COUNT)

// Raw audio is sampled from queue1 at 44.1KHz
#define RAW_AUDIO_SAMPLE_RATE 44100

static int16_t rawAudioBuffer[RAW_AUDIO_SAMPLE_RATE] DMAMEM;
static uint32_t rawSamplesCount = 0;
static int16_t downsampledBuffer[EI_CLASSIFIER_RAW_SAMPLE_COUNT] DMAMEM;

typedef struct {
    // Array of int16 values that are passed to the model
    int16_t *buffer;

    // Number of samples for a full buffer
    uint32_t buf_count;
} inference_t;

// Instance of the inference_t struct
static inference_t inference;

void setup()
{
    AudioMemory(16);

    Serial.begin(115200);

    // Initialize TFT
    pinMode(TFT_BL, OUTPUT);
    digitalWrite(TFT_BL, HIGH);
    delay(100);

    tft.begin();
    tft.setRotation(1);
    tft.setTextColor(TFT_TEXT_COLOR);
    tft.setTextSize(1);
    tft.setFont(Arial_10);
    tft.fillScreen(TFT_BACKGROUND_COLOR);

    queue1.begin();

    inference.buffer = downsampledBuffer;
    inference.buf_count = EI_CLASSIFIER_RAW_SAMPLE_COUNT;
}

void loop()
{
    // Get 128-sample (2 bytes per sample or "packet") block when ready
    if(queue1.available() && rawSamplesCount < RAW_AUDIO_SAMPLE_RATE)
    {   
        // Read in new data from queue1
        byte buffer[256];

        memcpy(buffer, queue1.readBuffer(), 256);
        queue1.freeBuffer();

        for(uint8_t i = 0; i < 128; i++)
        {
            // If buffer is full, set the flag and break
            if(rawSamplesCount >= RAW_AUDIO_SAMPLE_RATE)
                break;
            
            // Copy the new data into the buffer
            rawAudioBuffer[rawSamplesCount] = *((int16_t*)(&buffer[i * 2]));
            //Serial.printf("Value: %d\n", rawAudioBuffer[rawSamplesCount]);
            rawSamplesCount++;

            Serial.printf("Sample count is now: %d\n", rawSamplesCount);
        } 
    }

    // Check if buffer is full
    if(rawSamplesCount >= RAW_AUDIO_SAMPLE_RATE)
    {
        // Downsample the audio and pass it to the model for inferencing
        Serial.println("Downsampling...");
        downsampleAudio();
        Serial.println("Done");

        rawSamplesCount = 0;

        // Set up inferencing
        signal_t signal;
        signal.total_length = EI_CLASSIFIER_RAW_SAMPLE_COUNT;
        signal.get_data = &signal_get_data;
        ei_impulse_result_t result = { 0 };

        EI_IMPULSE_ERROR r = run_classifier(&signal, &result, false);
        if (r != EI_IMPULSE_OK) {
            Serial.printf("ERR: Failed to run classifier (%d)\n", r);
        }

        // print the predictions
        Serial.printf("Predictions ");
        Serial.printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
            result.timing.dsp, result.timing.classification, result.timing.anomaly);
        Serial.printf(": \n");

        uint16_t cursorY = 0;
        tft.fillScreen(TFT_BACKGROUND_COLOR);
        
        for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
            Serial.printf("    %s: %.5f\n", result.classification[ix].label, result.classification[ix].value);
            tft.setCursor(0, cursorY);
            tft.print(result.classification[ix].label);
            tft.fillRect(0, cursorY + 15, result.classification[ix].value * 200, 20, TFT_FOREGROUND_COLOR);
            tft.setCursor(210, cursorY);
            tft.printf("%.4f", result.classification[ix].value);
            cursorY += 40;
        }
    }
}

#define DOWNSAMPLE_SCALE_FACTOR 2.756254

// Transform the 44.1KHz samples into 16KHz
void downsampleAudio()
{
    for(uint16_t i = 0; i < EI_CLASSIFIER_RAW_SAMPLE_COUNT; i++)
    {
        int32_t sum = 0;

        // Grab the closest value in the 44.1KHz buffer
        uint16_t index = (uint16_t)(i * DOWNSAMPLE_SCALE_FACTOR);

        // Get the 3 closest values to average
        for(int8_t ind_offset = -1; ind_offset < 2; ind_offset++)
        {
            uint16_t newIndex = ind_offset + index;

            // Boundary checking
            if(newIndex < -1 || newIndex >= RAW_AUDIO_SAMPLE_RATE)
                continue;
            
            sum += rawAudioBuffer[newIndex];
        }

        downsampledBuffer[i] = (int16_t)(sum / 3);
    }
}

static int signal_get_data(size_t offset, size_t length, float *out_ptr)
{
    numpy::int16_to_float(&inference.buffer[offset], out_ptr, length);

    return 0;
}

Credits

Evan Rust

124 projects • 1121 followers

Embedded Software Engineer II @ Amazon's Project Kuiper. Contact me for product reviews or custom project requests.

Recognizing Keywords with the Sparkfun MicroMod system

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

The idea

Setting up the hardware

Creating a dataset of keywords

Training the machine learning model

The microphone

Displaying results on the LCD screen

Going further

Custom parts and enclosures

MicroMod ML Carrier Board + TFT Mounts

Schematics

Wiring

Code

MicroMod_Keyword_Spotter.ino

Credits

Evan Rust

Comments

Embed the widget on your own site

Recognizing Keywords with the Sparkfun MicroMod system

Recognizing Keywords with the Sparkfun MicroMod system

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

The idea

Setting up the hardware

Creating a dataset of keywords

Training the machine learning model

The microphone

Displaying results on the LCD screen

Going further

Custom parts and enclosures

MicroMod ML Carrier Board + TFT Mounts

Schematics

Wiring

Code

MicroMod_Keyword_Spotter.ino

Credits

Evan Rust

Comments

Related channels and tags