Arpit Sengar's Upcycled PC Speaker Delivers a Vocal LLM Assistant for Under $12 in Parts
Driven by an Espressif ESP32, this smart speaker connects to a Python-powered backend and out to cloud services including Google Gemini.
Maker Arpit Sengar has upcycled an old PC speaker into an ultra-low-cost large language model (LLM) voice assistant built atop Google's Gemini platform — for less than ₹1,000 (around $12).
"This project combines embedded systems and AI [Artificial Intelligence] inference to create an end-to-end conversational assistant," Sengar explains of the compact box, which includes an integrated battery for wire-free operation. "The [Espressif] ESP32 handles real-time audio recording and playback, while a Python backend performs: speech-to-text (STT), language understanding, [and] text-to-speech (TTS)."
The project is based around an old 2" PC speaker, which provides the audio outputs. This is installed in a housing which includes an Espressif ESP32-WROOM-32 microcontroller development board with built-in Wi-Fi connectivity, connected to an LM386 amplifier and a TDK InvenSense INMP441 MEMS microphone for input. There's also a tactile push-button input to the side, and a Top Power TP4056 charging module to handle the battery.
Rather than trying to run a large language model directly on-device, Sengar's design uses websockets to communicate with a remote system over Wi-Fi. First, audio is streamed to a speech recognition model based on Whisper; then the resulting text is fed to Google's Gemini large-language model as a prompt; Gemini's resulting output is then fed to the Piper speech synthesis model and streamed back to the ESP32 for playback, all using a Python back-end.
"Consider this backend as the brain of the system because this is where all the processing happens," Sengar explains. "For this I would recommend hosting an [Amazon] AWS EC2 instance assigned with a static IP. Alternatively you can run a local server on your laptop and connect your ESP32 through [a] hotspot."
The project is documented in full over on Instructables; source code is available on GitHub under the permissive MIT license.