Hackster's FPGAdventures: Machine Learning at the Edge with the Microchip PolarFire SoC Video Kit
Join us as our latest FPGAdventure dives into Microchip's VectorBlox acceleration engine, designed for high-efficiency edge AI workloads.
Now we've had a look at what Microchip's PolarFire SoC Video Kit can do out-of-the-box, it's time our FPGAdventure took us a little deeper into exactly what's possible when you've a low-power application-class RISC-V processor melded together with a highly-flexible FPGA — by deploying some low-power high-performance machine learning (ML) at the edge.
If you're just joining us on our journey, the PolarFire SoC Video Kit is Microchip's latest development board for its PolarFire SoC — a chip which combines performant but low-power 64-bit Linux-capable RISC-V cores with field-programmable gate array (FPGA) capabilities, allowing it to turn its hand to almost any task. It formed the heart of the PolarFire SoC Icicle Kit, a general-purpose development board, and likewise sits at the center of the PolarFire SoC Video Kit as the driving force for a pair of 4k-resolution camera sensors.
Read on to see exactly what is behind Microchip's push to put the PolarFire SoC at the forefront of the edge AI revolution.
Machine learning at the edge
Microchip has three routes handling machine learning workloads, depending on your target platform. The most well-known is its MPLAB Machine Learning Development Suite, which uses AutoML to create low-footprint machine learning models for deployment to the company's microcontrollers and microprocessors. Those with existing TensorFlow models can use the MPLAB suite to convert it to TensorFlow Lite for the same purpose.
The company's FPGA products, though, need a different approach: the VectorBlox Accelerator Software Development Kit. This takes models from a variety of sources and adapts them for use on the company's PolarFire and PolarFire SoC devices — though it doesn't include any capabilities for creating the models themselves, only optimization, quantization, and compilation for deployment.
There's a second string to the VectorBlox bow: machine learning acceleration intellectual property (IP) blocks, which can be synthesized in Libero SoC and which use the FPGA fabric to dramatically boost the performance of on-device deep- and machine-learning workloads with an overlay system allowing their functionality to be switched between models on-demand without resynthesis — offloading the work from CPU cores and running it more efficiently to keep power usage at a minimum.
To demonstrate the concept, Microchip has developed a multi-model demo — originally created for the earlier PolarFire Video Kit and more recently ported across to the PolarFire SoC Video Kit — which is where our FPGAdventure takes us this month.
Libero SoC
For those who didn't join us in the first series of FPGAdventures, Libero SoC is Microchip's development environment for FPGA products. It's a powerful suite of tools which takes care of the entire workflow, from writing hardware description language (HDL) by hand or using a drag-and-drop visual development environment to build a design using a vast library of IP blocks to synthesizing the design and flashing it to a target device.
It's also not the friendliest of software around. It's an issue of which Microchip is fully aware, and there are signs of progress: since our last experience with Libero SoC the installer has been updated to include the binaries for the licensing manager, avoiding the need to download and install it separately — though, sadly, the manager is still very much necessary, even if the license itself is available free of charge.
For newcomers the installation, licensing, and configuration of Libero SoC and its licensing manager will seem as off-putting as ever — and if you, like we, installed the software during our original FPGAdventure you'll need to install a newer version for the demo to work.
Newer, but not newest: our attempts to get the demo up and running were initially stymied by having installed a version of Libero SoC which was too new: Libero SoC 2023.2 instead of 2023.1. Having spoken to Microchip about the problem, we installed 2023.1 alongside the newer release — each of which needs around 30GB of hard drive space. This reveals a major shortcoming of the software stack: projects can be, and often are, tied to particular releases of Libero SoC, meaning it's not unlikely for a long-term user to end up with half a dozen copies installed side-by-side on the same system, all eating up storage space.
Each version of Libero SoC installs into its own directory — hence the doubling of the storage space required to have two versions running side-by-side — but they're not truly independent. All installed copies share a "vault," the term given by Microchip to the directory where downloaded IP blocks are kept. Our attempts to run the demo script in the correct version of Libero SoC were then further hampered by an unexpected and unhelpful error message — which, it turned out, was due to version conflicts which required emptying of the "vault" to resolve.
In Microchip's defense, here, its support staff are only an email away — and were more than happy to diagnose the problems and get the demo up-and-running as quickly as possible.
Building the demo
It's technically possible to side-step Libero SoC altogether, if you'd prefer. Microchip distributes the demo in two forms: a TCL script and supporting files, which needs to be executed in the specific version of Libero SoC for which it was written; and a "job" file which can simply be flashed straight to the FPGA as-is using the rather more accessible FlashPro Express software.
As with the Icicle Kit before it, there's no need to pick up any additional hardware for the flashing process: the Video Kit comes complete with a FlashPro 6 built into it and accessible on one of the four micro-USB ports scattered around the board. (Two of the remaining ports are USB On-The-Go ports for adding hardware, and the last provides access to the PolarFire SoC's four UART buses.)
There are benefits to going the Libero SoC route, though. The first is that the TCL script effectively automates what you yourself would do if you were building the design by hand; if you sit and watch, it provides an admittedly high-speed insight into the design flow. It also leaves you partway through the process, expecting you to manually kick-off the last few stages ahead of the flashing process — more valuable experience. Finally, it also leaves you with a design you can inspect, tweak, and modify to your heart's content, as a great jumping-off point for building something yourself.
It's not a fast process, though. As with earlier releases — and the latest, Libero SoC 2023.2 — Libero SoC remains solidly single-threaded. Regardless of how many cores your workstation has, you can expect to see only a single one working as the build progresses. Here, single-thread performance is king, as is memory: if you've got less than 16GB of RAM, expect the build to silently fail partway through the process.
On-device computer vision
Once the demo is built and flashed onto the FPGA, you'll need to update the Yocto-built Linux distribution on the Video Kit's eMMC storage. This is straightforward enough: download and extract the image, interrupt the Video Kit's boot process via the UART bus, issue a command to mount the eMMC as USB Mass Storage, then copy the image across using the tool of your choice.
This gets you what you need to build the Linux side of the edge AI demo, but not the demo itself. For that, you need to log in over UART or SSH, download and extract the demo, download and extract the example networks, and finally compile the demo — a one-time process, bar the need to "make overlay" on every boot before running the demo so as to configure the VectorBlox FPGA acceleration IP.
If you fire up the demo — as easy as ./run-video-model
at the terminal — and expect to see a live feed from the Video Kit's dual 4k-resolution camera module, though, you're going to be disappointed. At the time of writing, the demo did not support capture via the cameras — an odd state of affairs, given their prominence in the Video Kit bundle — with an update due for release in December expected to resolve the issue.
For now, though, you'll need a separate video source — which you feed to the Video Kit through its HDMI input, and for which you'll have to supply your own HDMI cable as there's only one in the bundle. This can be anything which outputs a valid HDMI signal: a laptop or desktop playing pre-recorded videos or a video camera showing a live feed.
For our experimentation, we hooked the Video Kit up to a Raspberry Pi 4 Model B single-board computer and used a mixture of pre-recorded and live content, the latter coming courtesy of the Raspberry Pi Camera Module 3 streaming at 1080p.
The demo starts with a facial recognition model, which ran on the 1,920×1,080 video feed at 33 frames per second (FPS). Instructions sent over the UART console let you target a particular face and assign an identity, as well as viewing and editing existing identities. Hit the space bar and the model switches to one which includes age and gender estimation alongside facial recognition — though your author would like to contest the accuracy and fairness of this, given it decided to add a decade and a bit onto his actual age.
Spacing into the next model will require a different feed: it's an automatic license plate recognition (ALPR) system, which ran at a considerably slower 17 frames per second. Space again gets you a version of MobileNet-V2, a convolutional neural network capable of classifying 1,000 object categories — and, at 50 frames per second, the fastest model in the demo suite. The next model uses YOLOv5 Nano, another classification model, which runs at 37 frames per second — and a final press of the space bar switches to Tiny YOLOv4 COCO, which although slow at 19 frames per second proved the most accurate in its object detection and classification.
Rolling your own
The demo's selection of models serves its purpose well: demonstrating that, using the VectorBlox acceleration IP, it's possible to put some surprisingly demanding models on-device and run them at the edge. This is all, it has to be remembered, running on a combined FPGA-and-CPU system-on-chip which needs so little energy to run it operates without a heatsink, never mind a fan.
It's enough to whet the appetite, and if you went the long way and built the demo in Libero SoC using the TCL script you can familiarize yourself with the FPGA side by browsing through the project. Beyond this, though, there's little easily-accessible documentation — an issue we've run into before with Microchip's PolarFire SoC offerings.
For that, you'll need to move from Libero SoC into the VectorBlox SDK — published by Microchip to GitHub and coming complete with a selection of tutorials. This is supported by a programmer's guide and the "CoreVectorBlox Handbook" detailing the overlay-based flexible accelerator which drives the demo, but neither go into much detail at 27 and 24 pages respectively.
Like the Icicle Kit before it, then, there's a steep learning curve. The performance of the demo does demonstrate there's definite value to be found in perseverance, though, and there's good news for makers and tinkerers: unlike the video streaming demo we looked at last month, which required the licensing of an encoder IP block at around $30,000 to operate for more than an hour at a time, VectorBlox is licensed free of charge. Like Libero SoC itself, you just have to apply for the license — then merge the resulting license file with your Libero SoC license.
For those curious to learn more, the PolarFire SoC Video Kit VectorBlox demo is published on GitHub, as is the VectorBlox SDK. Microchip's Yann Le Faou and Diptesh Nandi have also published an on-demand webinar, available on free registration, which walks through the company's on-device machine learning and artificial intelligence platforms across microcontrollers, microprocessors, and FPGAs.
More information on the PolarFire SoC Video Kit is available on the Microchip website, and can be found on Avnet's product page as well.
Read the rest of FPGAdventures Series 2: The Microchip PolarFire SoC Video Kit below:
- FPGAdventures Series 2 Episode 1: A New Journey with the Microchip PolarFire SoC Video Kit
- FPGAdventures Series 2 Episode 2: FPGA-Driven Video Streaming with the Microchip PolarFire SoC Video Kit
Read the whole of FPGAdventures Series 1: The Microchip PolarFire SoC Icicle Kit:
- FPGAdventures Series 1 Episode 1: Unboxing the PolarFire SoC Icicle Kit
- FPGAdventures Series 1 Episode 2: Installing Microchip's Libero SoC
- FPGAdventures Series 1 Episode 3: First Steps with Libero SoC
- FPGAdventures Series 1 Episode 4: Linux on RISC-V
- FPGAdventures Series 1 Episode 5: Linux Code Samples
- FPGAdventures Series 1 Episode 6: Asymmetric Multiprocessing (AMP)
- FPGAdventures Series 1 Episode 7: Building Logic using Libero SoC
- FPGAdventures Series 1 Episode 8: The Mi-V Ecosystem and the Future