How to Run High-Performance LLMs Locally on the Arduino UNO Q

Unleash local AI on the Arduino UNO Q with this tutorial on running LLMs and VLMs at the edge.

Running a VLM on the Arduino UNO Q (📷: Marc Pous)

In the first few months since the Arduino UNO Q was introduced, people have formed many different opinions about it. Some love the enhanced computational horsepower and the ability to run Linux, while others find the App Lab environment confusing and restrictive. Whatever side of the fence you find yourself on, one thing is certain — it is very different from the Arduino boards that came before.

Along with the change has come a lot of uncertainty about what this board is really good for. With its STM32H5 coprocessor, it can do all the things an UNO is typically used for. However, given the extra cost and complexity, you probably wouldn’t want to use an UNO Q to blink some LEDs. If you are going to invest in this new board, you are going to want to use it for more complex projects.

More than just blinking LEDs

Along those lines, Edge Impulse's Marc Pous has just demonstrated a very interesting way to use the UNO Q that would have been unthinkable before the addition of the Dragonwing processor. He has written up a brief tutorial explaining how one can run LLMs — and even VLMs — locally on the board.

The project is built around yzma, a Go wrapper for llama.cpp created by Ron Evans, well known for projects like Gobot and TinyGo. yzma provides a clean interface that allows developers to integrate high-performance inference into Go applications without wrestling with CGo bindings. This provides a streamlined path to running modern AI models directly on the UNO Q’s Debian-based Linux environment.

AI at the edge

The tutorial walks users through installing Go on the board, setting up yzma, and pulling in compatible GGUF models from Hugging Face. For text-only inference, Pous demonstrates the compact SmolLM2-135M-Instruct model, which weighs in at roughly 135 million parameters. Thanks to quantization and the efficiency of llama.cpp, the model can run locally on the UNO Q’s Arm-based system, enabling fully offline chat interactions.

This image was used to test the VLM (📷: Marc Pous)

Even more impressive is the demonstration of a multimodal model: SmolVLM2-500M-Video-Instruct. At around 500 million parameters, it is small by modern AI standards but still capable of processing images and short video inputs alongside text prompts. In Pous’ example, the board analyzes a photo of markers scattered across a desk and generates a detailed description — all without sending data to the cloud.

Instead of relying on remote APIs, developers can build privacy-conscious edge systems that interpret images, respond to voice commands, or analyze sensor data locally. For robotics and smart home experiments in particular, the ability to combine real-time microcontroller control with Linux-based AI inference opens up new design possibilities. If you build some of your own great ideas with an UNO Q, be sure to let us know.

artificial intelligence

machine learning

computer vision

Nick Bild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

How to Run High-Performance LLMs Locally on the Arduino UNO Q

Unleash local AI on the Arduino UNO Q with this tutorial on running LLMs and VLMs at the edge.

More than just blinking LEDs

AI at the edge

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles