Published April 16, 2021 © MIT

Deep Learning Speech Commands Recognition on ESP32

Train a neural network model in 10 minutes, and use it on ESP32 with MicroPython to control a light switch. Everything done in browser.

BeginnerFull instructions provided15 minutes13,307

Deep Learning Speech Commands Recognition on ESP32

Things used in this project

Hardware components

M5Stack M5StickC ESP32-PICO Mini IoT Development Board

SG90 Micro-servo motor

Software apps and online services

Tinkerdoodle online IDE

Story

Demo

You can get a speech commands model with your own words and run it in 10 minutes!

Motivation

The inexpensive ESP32 chips are so popular today. I would like to see how well it can run a deep neural network. The M5StickC is ESP32-powered, with a built-in microphone. This comes handy for a speech recognition project.

There are various tutorials on how to train and run a speech commands model on a ESP32. However, most of these tutorials train the model using the Google speech commands data set, which is a large data set but only has 20+ pre-defined speech commands. Also, the training must be done on a powerful machine, which can be a barrier for beginner makers.

Implementation

So I decided to do things differently. The model training is split into two parts: base model training and custom model training. The base model is trained using the full Google speech commands data set, and it serves as the feature extractor for the custom model. The custom model is trained using TensorFlow.js in browser. It requires far less samples to train a custom model than a base model. You can get pretty good recognition with as few as 50 samples.

Further, the base model is compiled into a custom MicroPython firmware. The custom model is loaded dynamically as a Python module. The M5StickC is able to run one model inference in 220ms, which is pretty impressive.

Try it out

I've shared the custom model training UI and the custom MicroPython firmware, so you can try it out with minimum coding at Tinkerdoodle online IDE.

The model training UI is shared at this Tinkerdoodle page. It is written in Javascript, so you can view the page source to check how it is done, if you have interest. In my demo video, I got a pretty good model to recognize two words ("dark" and "bright") in 5 minutes. You can do it too!

The MicroPython code, and instructions on how to flash the custom firmware etc, are available at this Jupyter notebook. Please note the stock firmware on M5StickC won't work.

The demo video also serves as a step-by-step tutorial.

Give it a try and let me know what you think!

Code

// Full code and UI: https://www.tinkerdoodle.cc/user/junfeng/speech-commands.html
// Code is FYI only. You can just use the UI to train your model.
//
model = tf.sequential();
// Dense layer takes the output from base model as features.
model.add(tf.layers.dense({inputShape: [numFlattenFeatures], units: labels.length}));
// Softmax layer for classification.
model.add(tf.layers.softmax());
model.compile({
    optimizer: tf.train.adam(),
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
});
var epochs = 20;
var currentEpoch = 1;
var info = await model.fit(x, y, {
    epochs,
    batchSize: 16,
    callbacks: {
        onBatchEnd: (batch, logs) => {
            if (batch === 0) {
                setMessage(`Epoch ${currentEpoch} out of ${epochs}.`);
                currentEpoch++;
            }
        }
    }
});
var finalAccuracy = info.history.acc[epochs - 1].toFixed(4);
tf.dispose([x, y]);

# Full code: https://tinkerdoodle.cc/user/_/notebooks/Shared/Junfeng/Speech%20Commands%20Model.ipynb
#
# Demo program to control a servo connected to M5StickC using speech commands.
# It supports saving samples for model fine-tuning.
# To adjust the label, press right side button.
# To save sample, press front button.
import gc
import m5stickc_lcd
import speech_model
from machine import I2S, PWM, Pin, reset

# Use the M5StickC built-in microphone.
mic = I2S(I2S.NUM0, ws=Pin(0), sdin=Pin(34), mode=I2S.MASTER_PDW,
    dataformat=I2S.B16, channelformat=I2S.ONLY_RIGHT,
    samplerate=16000, dmacount=16, dmalen=256)
lcd = m5stickc_lcd.ST7735()
# M5StickC is capable of running one model inference every 224ms.
# 7168 / (16000 * 2) = 0.224
buffer = bytearray(7168)
servo = PWM(Pin(26), freq=50, duty=70)
label = ''
label_index = -1

def save_feature(pin):
    if label_index != -1:
        speech_model.save(speech_model.labels[label_index])
    else:
        speech_model.save(label)
    reset()

def select_label(pin):
    global label_index
    label_index = (label_index + 1) % len(speech_model.labels)
    lcd.fill(0)
    lcd.text(speech_model.labels[label_index], 10, 30, 0xffff)
    lcd.show()

# Use the front and right side buttons for sample capture.
Pin(37, Pin.IN).irq(handler=save_feature, trigger=Pin.IRQ_FALLING)
Pin(39, Pin.IN).irq(handler=select_label, trigger=Pin.IRQ_FALLING)
lcd.text('Ready!', 10, 10, 0xffff)
lcd.show()
gc.collect()

while True:
    mic.readinto(buffer)
    l, prob = speech_model.predict(buffer)
    gc.collect()
    if l == '[OTHER]' or prob <= 70:
        continue
    label = l
    speech_model.snapshot()
    if label == 'k': # Update to your own label.
        servo.duty(100)
    elif label == 'g': # Update to your own label.
        servo.duty(40)
    lcd.fill(0)
    lcd.text(label, 10, 30, 0xffff)
    lcd.text(str(prob), 10, 50, 0xffff)
    lcd.show()

mic.deinit()
lcd.fill(0)
lcd.text('Done', 10, 10, 0xffff)
lcd.show()

Credits

Tinkerdoodle DIY

3 projects • 8 followers

Deep Learning Speech Commands Recognition on ESP32

Things used in this project

Hardware components

Software apps and online services

Story

Demo

Motivation

Implementation

Try it out

Schematics

Connect SG-90 micro servo to M5StickC

Code

Model training using TensorFlow.js

MicroPython program running on ESP32

Credits

Tinkerdoodle DIY

Comments

Embed the widget on your own site

Deep Learning Speech Commands Recognition on ESP32

Deep Learning Speech Commands Recognition on ESP32

Things used in this project

Hardware components

Software apps and online services

Story

Demo

Motivation

Implementation

Try it out

Schematics

Connect SG-90 micro servo to M5StickC

Code

Model training using TensorFlow.js

MicroPython program running on ESP32

Credits

Tinkerdoodle DIY

Comments

Related channels and tags