This AI Assistant Is a Local Hero

Max Headbox is an agentic AI assistant with personality that runs 100% locally on a Raspberry Pi 5.

Max Headbox is ready to do your bidding (📷: Simone Marzulli)

Building your own custom AI assistant that is powered by a locally-running large language model is easy these days. Techniques like model pruning and quantization have made it possible to run reasonably powerful models on modest computing platforms. Moreover, many hobbyists have already designed their own AI assistants and published their work, giving us all plenty of resources to lean on when building our own projects.

But there are AI assistants and then there are AI assistants. Sure, you could slap together a few components and create something that would look great hidden away in your closet (like this device built by yours truly). But what if you want to have an AI assistant sitting on your desk or bookshelf without scaring away non-techie guests? If you want to step up your hacking game, then Simone Marzulli has got a project that you will want to check out.

A high-level overview of device operation (📷: Simone Marzulli)

Marzulli built an AI assistant, but this one has a nice case and large display with an animated character to make it not only practical, but also presentable. Better still, it is not just a chatbot, but rather it is an AI agent. That means you can give it a list of digital tasks to do, and it will find the best tools to accomplish them, then carry out your requests. Even with these added capabilities, the device, called Max Headbox, still runs everything 100% locally.

Max Headbox is built around a Raspberry Pi 5, which provides just enough horsepower to run small but capable open source AI models. A custom case houses the Pi, cooling fan, and a compact display that shows the assistant’s cheerful green emoji face. A colored ribbon circling the head indicates what the system is doing: blue when it’s listening, red while recording, and rainbow when the model is busy generating a response.

On the software side, Marzulli leveraged a stack of established open source tools. A React Vite front end communicates with a Sinatra backend, which handles microphone control and recording. Audio input is passed to faster-whisper, a high-performance reimplementation of OpenAI’s Whisper, for speech-to-text transcription. To enable hands-free operation, a wake word detection system built with Vosk listens in standby mode and automatically shuts down when the language model is active to avoid CPU contention.

For managing the large language models themselves, Marzulli chose Ollama, a lightweight framework for running open models locally via an API. This may not be a permanent solution, but it provided a quick route to getting the proof-of-concept working. The models he settled on were Qwen3 1.7b and Gemma3 1b. Larger models like Qwen2.5 3b performed better in terms of accuracy, but added too much latency for the Raspberry Pi’s modest hardware.

The Qwen3 model was used to handle agentic task execution, while Gemma3 was used as a conversational model. Responses from the conversational model include both a response for the user and an associated “feeling” such as happy, interested, or confused. These map to emoji graphics displayed alongside the text, giving the assistant a hint of personality.

Plenty of details are available in the project’s blog post, so be sure to give it a read if you are thinking about building your own AI-powered assistant any time soon.

nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles