Software apps and online services
You can get a speech commands model with your own words and run it in 10 minutes!Motivation
The inexpensive ESP32 chips are so popular today. I would like to see how well it can run a deep neural network. The M5StickC is ESP32-powered, with a built-in microphone. This comes handy for a speech recognition project.
There are various tutorials on how to train and run a speech commands model on a ESP32. However, most of these tutorials train the model using the Google speech commands data set, which is a large data set but only has 20+ pre-defined speech commands. Also, the training must be done on a powerful machine, which can be a barrier for beginner makers.Implementation
So I decided to do things differently. The model training is split into two parts: base model training and custom model training. The base model is trained using the full Google speech commands data set, and it serves as the feature extractor for the custom model. The custom model is trained using TensorFlow.js in browser. It requires far less samples to train a custom model than a base model. You can get pretty good recognition with as few as 50 samples.
Further, the base model is compiled into a custom MicroPython firmware. The custom model is loaded dynamically as a Python module. The M5StickC is able to run one model inference in 220ms, which is pretty impressive.Try it out
I've shared the custom model training UI and the custom MicroPython firmware, so you can try it out with minimum coding at Tinkerdoodle online IDE.
The MicroPython code, and instructions on how to flash the custom firmware etc, are available at this Jupyter notebook. Please note the stock firmware on M5StickC won't work.
The demo video also serves as a step-by-step tutorial.
Give it a try and let me know what you think!