Pseudonymous maker "atomic14" has published a guide to building a "DIY Alexa" voice-activated assistant, dubbed Marvin, around an Espressif ESP32 running TensorFlow Lite for wake word detection.
"Want to build your own Alexa? All you will need is an ESP32 and microphone board," atomic14 explains. "I'm using a microphone breakout board that I've built myself based around the ICS-43434 — but any microphone board will work. The code has been written so that you can either use an I2S microphone or an analogue microphone using the built-in ADC. I would recommend using an I2S microphone if you have one as they have a lot better noise characteristics."
"Wake word detection is carried out using a model trained with TensorFlow and runs on the ESP32 using TensorFlow Lite. A pre-trained model is included in the firmware folder so you can get up and running straight away."
Wake word detection is only half the battle, of course, and the other half — recognizing a natural-language spoken command — is asking a little much of an ESP32 microcontroller. The solution: Streaming the audio out for wit.ai, a free-to-use voice control cloud platform developed by Facebook. "I've included the access token for this in the code," atomic14 mentions, "but will disable it in the next few weeks."
"How well does it actually work? It works reasonably well. We have a very lightweight wake word detection system. It runs in around 100 milliseconds and there's still room for lots of optimization. Accuracy on the wake word is okay — we do need more training data to make it really robust."
"The wit.ai system works very well and you can easily add your own intents and traits and build a very powerful system. There are also alternative paid versions which you can use instead: One is available from Microsoft, and Google and Amazon also have similar and equivalent services."