Gen AI on Your Raspberry Pi: A Hands-On Review of the Raspberry Pi AI HAT+ 2
From computer vision workloads to large language models (LLMs), Raspberry Pi's latest Hailo-10H accelerator offers 40 TOPS of INT4 compute.
Raspberry Pi is back on the artificial intelligence (AI) bandwagon again, announcing the latest entry in its family of accelerators for the Raspberry Pi 5 — and this time it's focusing on generative AI, particularly large language models (LLMs).
The Raspberry PI AI HAT+ 2 is built around the Hailo-10H coprocessor, delivering a claimed 40 tera-operations per second (TOPS) of compute at a reduced INT4 precision. It's also joined for the first time by 8GB of dedicated LPDDR4 RAM, giving it the grunt required to run LLMs with up to 1.5 billion parameters.
Is this a $130 way to get your foot in the door of the generative AI boom, or just more hot air inflating the AI bubble? Let's find out.
Hardware
- Form factor: Hardware Attached on Top Plus (HAT+)
- NPU: Hailo-10H accelerator, claimed at 40 TOPS (INT4 precision)
- Memory: 8GB LPDDR4 (4GB, 2GB models hinted at but not yet launched)
- Compatibility: Raspberry Pi 5 only
- Interface: PCI Express via Raspberry Pi 5-standard FFC, 40-pin General-Purpose Input/Output Header (GPIO)
- Power draw: 2.5W typical
- Box contents: AI HAT+ 2, heatsink with push-pin mounts, extended GPIO header, mounting pillars and screws, heatsink installation instruction card
- Price: $130
The Raspberry Pi AI HAT+ 2 is, you will be entirely unsurprised to hear, a follow-up to the earlier Raspberry Pi AI HAT+, matching the form-factor of its predecessor almost perfectly, but swapping out the Hailo-8 or Hailo-8L coprocessor for the newer Hailo-10H. In raw numbers, that means a boost in claimed compute performance from 13 or 26 tera-operations per second (TOPS) to an impressive 40 for the same power draw — but things aren't quite that simple.
The Hailo-8 family of AI-centric coprocessors used in the original AI HAT+ range operated at INT8 precision, but the Hailo-10H runs at INT4. Lowering precision means models can fit in less RAM and run with boosted performance on compatible hardware — but the process of reducing the precision of a model, known as "quantization," can have a measurable impact on the accuracy of the model's responses.
There's another change in the new accelerator's design, too: the addition of dedicated RAM. The original AI HAT+ used the Raspberry Pi's own system memory to hold the model and the data on which it was working; the AI HAT+ 2 shifts away from this unified memory model with 8GB of on-board RAM, invisible to the host Raspberry Pi and usable only by the Hailo coprocessor itself.
That raises something of a red flag in terms of future pricing. In recent months the cost of RAM components has skyrocketed, thanks — ironically enough — to insatiable demand from the AI boom. Raspberry Pi has already been forced to increase the price of its single-board computer products, and more price hikes are likely to follow; now it's putting 8GB of increasingly-expensive LPDDR4 on the AI HAT+ 2, and hoping its $130 asking price offers enough of a margin to insulate from further increases in component costs.
There's evidence Raspberry Pi is hedging its bets here, though: the only model of AI HAT+ 2 available at launch has 8GB of RAM, but silkscreen labeling for resistor pads on the board reveals unannounced 4GB and 2GB variants have, at least, been floated as a possibility. Should component prices continue to rise, as they almost certainly will, this gives Raspberry Pi room to both hike the cost of the flagship 8GB model and launch cost-reduced versions to soften the blow — an approach it has already implemented with the Raspberry Pi 5 family of single-board computers.
Combined, the 8GB of dedicated RAM and a shift to INT4-capable hardware deliver one thing in particular: support for generative AI models, particularly large language models (LLMs).
Installation
Installing the Raspberry Pi AI HAT+ 2 is as easy as its predecessor: it sits above the body of a Raspberry Pi 5 — earlier models aren't compatible due to lack of a user-exposed PCI Express lane, and neither are the Raspberry Pi 500 and Pi 500+ wedge computers nor the Raspberry Pi Compute Module 5 unless installed in a Raspberry Pi 5-format carrier board — on bundled stand-offs, providing room for an Active Cooler below.
A pre-fitted flat flexible circuit (FFC) links the board to the PCI Express lane on the Raspberry Pi 5, a simple case of lifting up the connector's flap and pushing the cable and flap home again, while a header connects to the Raspberry Pi's 40-pin general-purpose input/output (GPIO) pins. While this is, technically, pass-through if used with a long enough GPIO extension, the bundled mounting hardware doesn't expose any pins — so you can't use the AI HAT+ 2 with any other GPIO-connected hardware.
The final stage is installing the heatsink, which is a slightly hairy experience. While not mandatory, its use is recommended — and installation requires you to carefully peel the protective plastic off the pre-installed thermal pads then insert two very stiff plastic wing-type push-pins though mounting holes on the top of the board. This requires more force than you'd expect, and care is required not to crush any of the parts on the board.
The software side of things is a little more thorny — though criticism here should be read with the understanding that this review took place prior to the public launch using pre-release software, so improvements on this front are hopefully already in place. The Hailo-10 has a different architecture to the Hailo-8, meaning separate drivers must be installed; then you need to install the software that lets you actually use the accelerator: Hailo's LLM model zoo.
There were two ways of doing this at the time of review. The first is to register with Hailo — it's free — and download a .deb package from the company's "Developer Zone." Prior to an update released on 8 January, this worked fine; the update, though, broke compatibility with Raspberry Pi OS. The other way is to clone the company's GitHub repository and compile it yourself — a fairly speedy process on a Raspberry Pi 5, but one for which the official instructions are at times incorrect or incomplete making it a harder process than it needs to be.
There's a second repository, too, separate to the model zoo and offers a wider range of sample applications for not only generative AI workloads but computer vision. This has only one method of installation: clone the repository and run an included shell script with root permissions, which downloads prerequisites and compiles the necessary source code. Unlike the model zoo, this is a somewhat more power-hungry affair — and attempts to install the software on a Raspberry Pi 5 2GB caused the out-of-memory killer to terminate the process before completion. Switching to a top-end Raspberry Pi 5 16GB fixed this, naturally enough.
Talk to me
Hailo's model zoo comes with five downloadable large language models compatible with the Hailo-10H: qwen2:1.5b, qwen2.5:1.5b, qwen2.5-coder:1.5b, llama3.2:1b, and deepseek_r1:1.5b — each ranging from one to 1.5 billion parameters, which is around the upper limit of where 8GB of RAM will get you. These are handled by a port of ollama that is compatible with the Open WebUI web interface — though you need to install this in a Docker container, as it doesn't work with the version of Python currently shipping in Raspberry Pi OS "Trixie."
It's possible to interact with any of the five LLMs, which need to be downloaded individually each taking up a few gigabytes of storage space, without Open WebUI, but it involves awkward HTTP POST requests returning hard-to-read JSON objects. Open WebUI provides a slick web-based interface, exposing the running LLM as a chatbot like a commercial service and even including the ability to use voice recognition and text to speech for live conversational queries — though this was blocked by browsers in testing, for reasons that will become clear.
Sadly, Open WebUI is a massive security hole, even accounting for its recently-fixed remote code execution vulnerability. Once installed it binds itself to all network interfaces, allowing full access from any other device on the network. The first user to access it in a browser will be prompted to create a password-protected "admin" account, but these credentials — and everything else, including your prompts and responses — are sent via an unencrypted HTTP connection.
For those looking to run a local LLM server at home, it gets worse: any prompts sent to the hailo-ollama web API, whether from Open WebUI, other compatible clients, or manual HTTP requests, are echoed to the terminal that launched the hailo-ollama server — along with the LLM's response. For shared use, that's a privacy nightmare.
Putting that aside, Open WebUI works reasonable well. Unfortunately, the same can't be said for the models. While they share their names with popular large language models, they're considerably shrunken in parameter count and quantized down to INT4 precision — and it shows. All models tested failed the "strawberry test," where the LLM is asked how many times the letter R appears in the word "strawberry," with qwen2:1.5b going so far as to respond that "there is no such thing as a 'strawberry'" after initially outputting it had either 72 Rs or $16.
This is, of course, the key problem with LLMs: they don't "think" or "reason" — though deepseek_r1:1.5b will churn through hundreds of words role-playing both before giving you an answer-shaped response — and work only on the statistical continuation of a stream of tokens. The higher the number of parameters the better the response, and a billion parameters isn't anywhere near enough to use a model as anything more than a novelty.
Local LLMs for local people
There's no getting around it: an LLM running on a Raspberry PI with AI HAT+ 2 will never come close to commercial services like OpenAI's ChatGPT or Google's Gemini. That's not to say those hundreds-of-billions-of-parameter commercial services are good, because they're not, but that the models the AI HAT+ 2 is capable of running are objectively terrible.
There are advantages, though: the Hailo-10H is rated at a claimed 2.5W power draw during active inference, a fraction of the hundreds of watts a GPU-based accelerator needs for the same workload. In testing, this equated to a rise from a 3.4W full-system idle power draw to 5.2W while an LLM was responding in Open WebUI — an impressive feat. Overall power draw is kept down by unloading the accelerator after a brief timeout period, which does mean a 25-40 second delay before the LLM will start responding to an initial prompt.
Another advantage is that all your data is processed locally: nothing you type into the LLM's input prompt goes to any cloud servers, and if you're using it locally on the Raspberry Pi itself it never even hits your local network. That would be a major advantage for privacy, if the models were of any use for privacy-sensitive queries — but, sadly, they're not.
The trade-off for all this is that the models are not only "dumber" than their commercial equivalents but less capable. Open WebUI lets you upload images, documents, and videos, but none of the models included in Hailo's model zoo can do anything with inputs that aren't text-based. None of the models have any way to search for more information or update to include data from beyond their "knowledge cut-off" dates, either, and are isolated from the web — though some will give you an example of a search term you could copy-and-paste into a search engine yourself. Their context windows are also very narrow, meaning they can't work on long textual inputs.
Outside the model zoo, the sample application repository does include a demonstration of a multimodal model: this can be fed a live input from a Raspberry Pi Camera Module and then given voice- or text-based queries. As with the model zoo LLMs, though, its responses are typically of a very poor quality.
Better results come from the traditional computer vision project examples included in the repository, including pose estimation, depth estimation, and object detection. These run well on the Raspberry Pi AI HAT+ 2, fed with either pre-recorded or live video — but they also ran just fine on the original Raspberry Pi AI HAT+, and the earlier Raspberry Pi AI Kit, and even on the all-in-one Raspberry Pi AI Camera Module where the inference takes place on-camera.
Conclusion
It's hard to come to the conclusion that the Raspberry Pi AI HAT+ 2 is worth buying. It's faster than the Raspberry Pi AI HAT+, sure, but only by dropping from INT8 to INT4 precision. For computer vision tasks the Raspberry Pi AI HAT+ or Raspberry PI AI Kit will offer comparable performance at a lower cost, and the generative AI tasks the new AI HAT+ 2 can handle are universally terrible.
There's not much room for growth, either. The bulk of the capability gains found in leading-edge LLMs comes from an increase in the number of parameters in the model, and 8GB is already a couple of orders of magnitude too little to run, for example, the full-fat DeepSeek-R1 — which activates 37 billion out of a total of 671 billion parameters, compared to the distilled 1.5-billion-parameter version you can run on the AI HAT+ 2.
For those working on computer vision projects, the Raspberry Pi AI Camera Module is a better choice: it's cheaper, at $70 rather than $130 plus the cost of a non-AI Camera Module, and it leaves the Raspberry Pi 5's PCIe lane free for high-speed Non-Volatile Memory Express (NVMe) storage to provide a great boost to overall system performance.
For those who desperately want an energy-efficient locally-hosted LLM, who are willing to overlook the vast computational and environmental resources consumed to train such an LLM and the ethical concerns surrounding how the training data is gathered,, and who don't mind the fact that it will output answer-shaped objects that make no sense and code-shaped objects that don't run, there's a competing option on the horizon: Hailo has also partnered with ASUS to put the same Hailo-10H coprocessor and 8GB of RAM into a USB dongle.
Unveiled at the Consumer Electronics Show (CES) in Las Vegas last week, ASUS' UGen300 USB AI Accelerator offers exactly the same functionality as the Raspberry Pi AI HAT+ but boasts compatibility with all 64-bit models of Raspberry Pi, rather than just the Raspberry Pi 5 family, as well as other Arm- and AMD64-based single-board computers, mainstream desktops, laptops, servers, tablets, and even Android smartphones. While no price had been announced at the time of writing, if it comes even close to the $130 of the Raspberry Pi AI HAT+ 2 it'll make for a much more flexible option — and won't tie up your only PCIe lane.
As a final demonstration of the limitations of the Raspberry Pi AI HAT+ 2, deepseek_r1:1.5b was prompted to summarize this review: "Conversation context is full. It is adivsable [sic] to clear context as cache size was reached" was the only response.
The Raspberry Pi AI HAT+ 2 is available to order from Raspberry Pi resellers today for $130.
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.