Building a Truly Functional Offline AI Assistant
Pocket is a portable AI assistant that uses a network of models to provide quick responses that are actually useful.
Even though there have been some spectacular commercial failures, such as the Humane AI Pin, people still want portable AI assistants that will allow them to take advantage of the latest LLMs on their own terms. The off-the-shelf offerings are still very weak, so many hardware hackers are taking it upon themselves to build their own devices. Some of these are quite interesting, but let’s be honest — they tend to be slow and not especially useful.
The technology behind these AI assistants keeps advancing, and Naz Louis believes that these advances have now gotten us to the point where we can build useful, speedy offline devices. To prove it, he built a Raspberry Pi-powered system that is not only a chatbot, but that also has computer vision capabilities and the ability to pull in real-time information, such as weather or stock prices.
In this system, each request is first evaluated by a router that decides which specialized AI model should handle the task. Simpler requests—like casual conversation or jokes —are handled by a lightweight version of Qwen 2.5 configured for fast responses. More complex questions that require deeper reasoning are routed to a “thinking” variant of the same model. While this version can take significantly longer to produce an answer, it delivers far more accurate results for tasks such as coding or complex calculations.
In this system, each request is first evaluated by a router that decides which specialized AI model should handle the task. Simpler requests — like casual conversation or jokes — are handled by a lightweight version of Qwen 2.5 configured for fast responses. More complex questions that require deeper reasoning are routed to a “thinking” variant of the same model. While this version can take significantly longer to produce an answer, it delivers far more accurate results for tasks such as coding or complex calculations.
Another specialized model, Function Gemma, handles system actions. This compact 270-million-parameter model is fine-tuned specifically to trigger functions such as checking the weather, performing web searches, or scanning the local network for suspicious devices.
To handle visual processing without overloading the Pi, the build includes a Raspberry Pi AI HAT+ with a Hailo‑8 AI accelerator. This dedicated chip processes computer vision workloads — like object detection and pose estimation — allowing the assistant to recognize objects in real time using a 12-megapixel Arducam camera.
Speech capabilities are handled entirely on-device as well. Voice input is processed with Whisper for speech-to-text, while responses are generated using Piper TTS, which provides natural-sounding voice output without requiring cloud services.
The handheld device also includes a 4.3-inch touchscreen for the interface and a Geekworm X1004 UPS module powered by two 18650 batteries, giving the system roughly one to two hours of portable runtime. All of the hardware is housed inside a custom 3D-printed enclosure designed with extra ventilation to deal with the heat generated by the Pi, the AI accelerator, and the batteries.
It may not replace smartphones anytime soon, but projects like Pocket demonstrate that truly private, offline AI assistants are starting to become practical — especially for people willing to build their own device.