Creating a DIY Voice Assistant with Mycroft AI and a Raspberry Pi 4

Learn how Andy from element14 Presents built his own smart speaker device with only DIY hardware and the Mycroft voice assistant project.

1 year ago • Home Automation / Internet of Things / Machine Learning & AI / Communication / Productivity

The motivation

Voice assistants such as Amazon's Alexa, Google Assistant, Microsoft's Cortana, and Samsung's Bixby all function by listening for a wake word, parsing audio into an intent, and then acting upon that intent. These devices are highly useful for performing hands-free tasks, yet their intrinsic dependencies on large companies means they are prone to being modified or shut down with little to no warning. Element14 Presents host Andy wanted to figure out a way to prevent his voice assistant from becoming unusable, so he opted to build his own version based on Mycroft's AI's engine.

Project overview

Just like other systems, Andy's would active upon hearing a present wake word. In this case, the user must say "Hey Mycroft" before the rest of their speech will be processed. Once the microphones have recorded the audio, that data is then passed to a speech-to-text engine which, in turn, parses it into text. The core functionality is the utterance to intent engine that is responsible for translating the spoken words into tangible actions for use in conversations, looking up information, or triggering an action to occur either in the real-world or virtually.

Gathering the hardware

Because of the large computational requirements of listening for wake words and determining the user's intents, a small microcontroller/microprocessor was not feasible. Rather, Andy chose the Latte Panda single-board computer (SBC) over the Arduino Portenta H7 because it is more suited towards running general applications while still having general purpose I/O headers. For picking up audio, a USB Seeed microphone array was chosen due to it having four individual microphone elements that can determine both the direction of speech and cancel out any unwanted background noise. Finally, a pair of powered 3.5mm jack speakers were used since they did not require additional amplification hardware.

A big problem

For the most part, setting up the Latte Panda was easy, as all Andy had to do was install the operating system, get Mycroft compiled, and ensure his device settings were correct. However, when it came time to test his setup, wake words were not being detected, and it was largely because of the Intel x64 processor not supporting certain instructions for running the model.

Setting up the software

After swapping out the Latte Panda for a Raspberry Pi 4 and using the premade PiCroft OS image, Andy discovered that it too failed to work correctly. The last, and successful, attempt was achieved by using the Pi OS Lite image, configuring PulseAudio to read data from the microphone array and output audio to the speakers, and lastly installing Mycroft by compiling it from source.

Using the smart speaker

Although it is not nearly as advanced as something like an Amazon Echo or Google Assistant Home, this DIY solution still offers most of the base features. Users are able to ask it for the time, set a timer, look up information, and more thanks to its wide variety of customizable skills that are continually growing. For more information about Andy's project, you can watch his video here on the element14 Presents YouTube Channel.

IoT, web, and embedded systems enthusiast. Contact me for product reviews or custom project requests.