Team AMY:

Adiel Baja Kelana 76884

•

Mifthahul Maulana Kamal

•

Yabes Henli Salem 90342

•

Hendra Kusumah

Published May 21, 2025

Keyword Spotting on ESP32-S3 with INMP441 and MAX7219

Offline voice keyword detector using ESP32-S3, INMP441 & MAX7219. Powered by Edge Impulse, no cloud needed, runs fully on-device.

BeginnerProtip5 hours850

Keyword Spotting on ESP32-S3 with INMP441 and MAX7219

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Plus

MB102 Breadboard 830 Point Solderless PCB Bread Project Board

MAX7219 8x32 LED Matrix

INMP441 FRONT MIC

Male/Female Jumper Wires

Software apps and online services

Arduino IDE

Edge Impulse Studio

Story

Offline Voice Keyword Spotting with XIAO ESP32-S3, INMP441 Microphone, and MAX7219 LED Matrix

This project uses a XIAO ESP32-S3 paired with an INMP441 I²S microphone to recognize the voice command “halo esp”. When the keyword is detected, it scrolls “HALO” on a MAX7219 8x8 LED matrix (4 devices). If any other sound or noise is heard, it displays “...” for noise and "no" for unknown sound instead.

The entire system runs offline, powered by a compact AI model trained on Edge Impulse, ensuring fast and reliable keyword spotting with no need for internet or cloud services.

Hardware

XIAO ESP32S3
MAX 7219 LED Dot Matrix 8x8 4 Devices
Microphone INMP 441 + I²S
Male-female jumper wires
Breadboard MB-102 830 point solderless

Software

Arduino IDE
Edge Impulse Studio (for training and exporting your keyword spotting model)

Required Libraries

MD_Parola.h
MD_MAX72xx.h
SPI.h
driver/i2s.h
freertos/FreeRTOS.h

Wiring Diagram

MAX 7219 LED Dot Matrix 8x8 4 Devices

- VCC → 5V

- GND → GND

- DIN → GPIO6

- CS → GPIO5

- CLK → GPIO7

Microphone INMP 441 + I²S

- VCC → 3V

- BCLK → GPIO2

- WS → GPIO3

- DIN → GPIO1

Step-by-Step Guide

1. Install Libraries

Open the Arduino IDE, and install the following libraries via the Library Manager:

MD_Parola

MD_Parola

MD_MAX72XX

MD_MAX72XX

SPI

SPI

Also, install the Edge Impulse library exported as a .zip file:
Go to Sketch → Include Library → Add .ZIP Library, and select the .zip you downloaded from the Edge Impulse Deployment page (e.g., kamaru123-project-1_inferencing.zip).

Note: Libraries like driver/i2s.h and freertos/task.h are built-in when you select XIAO ESP32-S3 in the Board Manager.

Note: Libraries like driver/i2s.h and freertos/task.h are built-in when you select XIAO ESP32-S3 in the Board Manager.

2. Arduino Code

Use a program that does the following:

1. Initializes the INMP441 microphone via I2S

Initializes the INMP441 microphone via I2S

2. Runs audio inference using the Edge Impulse model

Runs audio inference using the Edge Impulse model

Displays "HALO" on the MAX7219 LED Matrix when the keyword "halo esp" is detected

Displays "HALO" on the MAX7219 LED Matrix when the keyword "halo esp" is detected

Displays "NO" if the keyword is not recognized

Displays "NO" if the keyword is not recognized

5. Ensure your pin configuration matches:

MAX7219: DIN = GPIO6, CLK = GPIO7, CS = GPIO5

MAX7219: DIN = GPIO6, CLK = GPIO7, CS = GPIO5

INMP441: BCK = GPIO2, WS = GPIO3, SD = GPIO1

INMP441: BCK = GPIO2, WS = GPIO3, SD = GPIO1

3. Upload and Run

1. Connect the XIAO ESP32-S3 to your computer using a USB-C cable

Connect the XIAO ESP32-S3 to your computer using a USB-C cable

2. Open Arduino IDE

Open Arduino IDE

3. Go to Tools → Board and select:
✅ XIAO_ESP32S3

Go to Tools → Board and select:
✅ XIAO_ESP32S3

4. Go to Tools → Port and select the correct COM port

Go to Tools → Port and select the correct COM port

5. Click the Upload button

Click the Upload button

6. Open the Serial Monitor (baud rate: 115200)

Open the Serial Monitor (baud rate: 115200)

7. Say "halo esp" near the microphone:

✅ If detected, the LED matrix will scroll "HALO"

✅ If detected, the LED matrix will scroll "HALO"

❌ If not detected, it will display "NO"

❌ If not detected, it will display "NO"
Say "halo esp" near the microphone:

✅ If detected, the LED matrix will scroll "HALO"

❌ If not detected, it will display "NO"

Code

ESP32-S3 Edge Impulse Keyword Spotter with MAX7219 Display

#include <kamaru123-project-1_inferencing.h>
#include <MD_Parola.h>
#include <MD_MAX72xx.h>
#include <SPI.h>
#include "driver/i2s.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"

// Konfigurasi MAX7219
#define HARDWARE_TYPE MD_MAX72XX::FC16_HW
#define MAX_DEVICES 4
#define DATA_PIN   6  // DIN MAX7219
#define CLK_PIN    7  // CLK MAX7219
#define CS_PIN     5  // CS MAX7219

MD_Parola mx = MD_Parola(HARDWARE_TYPE, DATA_PIN, CLK_PIN, CS_PIN, MAX_DEVICES);

// Struktur buffer untuk inferensi
typedef struct {
    int16_t *buffer;
    volatile uint8_t buf_ready;
    volatile uint32_t buf_count;
    uint32_t n_samples;
} inference_t;

static inference_t inference;
static const uint32_t sample_buffer_size = 2048;
static int16_t sampleBuffer[sample_buffer_size];
static volatile bool record_status = false;
static bool debug_nn = false;

// Threshold deteksi dan smoothing
#define DETECTION_THRESHOLD 0.30f
#define DETECTION_STREAK_REQUIRED 3
static int detection_streak = 0;

// Fungsi tampilkan teks ke MAX7219 dengan scroll
void tampilkanTeks(const char* teks) {
    mx.displayClear();
    mx.displayScroll(teks, PA_LEFT, PA_SCROLL_LEFT, 100);
    while (!mx.displayAnimate()) {
        // Tunggu animasi selesai
    }
}

// Fungsi callback audio: isi buffer inference dari sampleBuffer
static void audio_inference_callback(uint32_t n_bytes) {
    for (int i = 0; i < (n_bytes >> 1); i++) {
        if (inference.buf_count < inference.n_samples) {
            inference.buffer[inference.buf_count++] = sampleBuffer[i];
        }
        if (inference.buf_count >= inference.n_samples) {
            inference.buf_ready = 1;
            break;
        }
    }
}

// Task capture sampel dari I2S secara terus-menerus
static void capture_samples(void* arg) {
    const int32_t bytes_to_read = (int32_t)arg;
    size_t bytes_read;

    while (record_status) {
        esp_err_t err = i2s_read(I2S_NUM_1, (void*)sampleBuffer, bytes_to_read, &bytes_read, portMAX_DELAY);
        if (err == ESP_OK && bytes_read > 0) {
            // Gain adjustment (kalikan 8)
            for (size_t i = 0; i < bytes_read / 2; i++) {
                sampleBuffer[i] = (int16_t) (sampleBuffer[i] * 6);
            }
            audio_inference_callback(bytes_read);
        }
    }
    vTaskDelete(NULL);
}

// Inisialisasi mikrofon untuk inferensi
static bool microphone_inference_start(uint32_t n_samples) {
    inference.buffer = (int16_t*)malloc(n_samples * sizeof(int16_t));
    if (!inference.buffer) {
        ei_printf("ERR: Failed to allocate buffer\n");
        return false;
    }

    inference.buf_count = 0;
    inference.n_samples = n_samples;
    inference.buf_ready = 0;

    if (i2s_init(EI_CLASSIFIER_FREQUENCY)) {
        ei_printf("ERR: Failed to init I2S\n");
        free(inference.buffer);
        return false;
    }

    ei_sleep(100);
    record_status = true;
    xTaskCreate(capture_samples, "CaptureSamples", 4096, (void*)sample_buffer_size, 1, NULL);
    return true;
}

// Fungsi tunggu sampai buffer siap (sampel sudah lengkap)
static bool microphone_inference_record(void) {
    while (!inference.buf_ready) {
        delay(10);
    }
    inference.buf_ready = 0;
    inference.buf_count = 0;
    return true;
}

// Fungsi untuk menyediakan data float ke Edge Impulse dari buffer int16
static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr) {
    numpy::int16_to_float(&inference.buffer[offset], out_ptr, length);
    return 0;
}

// Inisialisasi I2S ESP32 untuk INMP441
static int i2s_init(uint32_t sampling_rate) {
    i2s_config_t i2s_config = {
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
        .sample_rate = sampling_rate,
        .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
        .channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT,
        .communication_format = I2S_COMM_FORMAT_I2S,
        .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
        .dma_buf_count = 8,
        .dma_buf_len = 512,
        .use_apll = false,
        .tx_desc_auto_clear = false,
        .fixed_mclk = 0
    };

    i2s_pin_config_t pin_config = {
        .bck_io_num = 2,    // D1 = GPIO2 (BCLK)
        .ws_io_num = 3,     // D2 = GPIO3 (WS)
        .data_out_num = -1,
        .data_in_num = 1    // D0 = GPIO1 (SD)
    };

    esp_err_t ret;
    ret = i2s_driver_install(I2S_NUM_1, &i2s_config, 0, NULL);
    if (ret != ESP_OK) return 1;
    ret = i2s_set_pin(I2S_NUM_1, &pin_config);
    if (ret != ESP_OK) return 1;
    ret = i2s_zero_dma_buffer(I2S_NUM_1);
    return (ret == ESP_OK) ? 0 : 1;
}

void setup() {
    Serial.begin(115200);
    ei_printf("Edge Impulse Keyword Spotting + MAX7219\n");

    mx.begin();
    mx.setIntensity(3);
    mx.displayClear();
    tampilkanTeks("READY");

    ei_printf("Inferencing settings:\n");
    ei_printf("\tInterval: "); ei_printf_float((float)EI_CLASSIFIER_INTERVAL_MS); ei_printf(" ms.\n");
    ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);
    ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);
    ei_printf("\tNo. of classes: %d\n", EI_CLASSIFIER_LABEL_COUNT);
    ei_sleep(2000);

    if (!microphone_inference_start(EI_CLASSIFIER_RAW_SAMPLE_COUNT)) {
        ei_printf("ERR: Could not allocate audio buffer.\n");
        while (true) delay(1000);  // Stop program
    }
}

void loop() {
    if (!microphone_inference_record()) {
        ei_printf("ERR: Failed to record audio...\n");
        return;
    }

    signal_t signal;
    signal.total_length = EI_CLASSIFIER_RAW_SAMPLE_COUNT;
    signal.get_data = &microphone_audio_signal_get_data;

    ei_impulse_result_t result;
    EI_IMPULSE_ERROR r = run_classifier(&signal, &result, debug_nn);
    if (r != EI_IMPULSE_OK) {
        ei_printf("ERR: Failed to run classifier (%d)\n", r);
        return;
    }

    bool detected = false;
    for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
        ei_printf("  %s: ", result.classification[ix].label);
        ei_printf_float(result.classification[ix].value);
        ei_printf("\n");

        if (strcmp(result.classification[ix].label, "halo esp") == 0 && result.classification[ix].value > DETECTION_THRESHOLD) {
            detection_streak++;
            if (detection_streak >= DETECTION_STREAK_REQUIRED) {
                tampilkanTeks("HALO");
                detected = true;
                detection_streak = 0;
                break;
            }
        }
    }

    if (!detected) {
        detection_streak = 0;
        tampilkanTeks("NO");
    }

    delay(100);
}

Keyword Spotting on ESP32-S3 with INMP441 and MAX7219