Tomeu Vizoso's Open Source NPU Driver Project Does Away with the Rockchip RK3588's Binary Blob

Anyone with a Rockchip RK3588 and a machine learning workload now has an alternative to the binary blob driver, thanks to Vizoso's efforts.

Developer Tomeu Vizoso is working on an open source driver for the neural processing unit (NPU) in the Rockchip RK3588 system-on-chip — and has hit the milestone of being able to use the coprocessor to run an object classification model at 30 frames per second using only one of the NPU's three available cores.

"Rockchip, as most other vendors of NPU IP, provides a GPL [licens]ed kernel driver and pushes out their userspace driver in binary form," Vizoso explains. "The kernel driver is pleasantly simple and relatively up-to-date in regards of its use of internal kernel APIs. The userspace stack though is notoriously buggy and difficult to use, with basic features still unimplemented and performance being quite below what the hardware should be able to achieve.

"The version of the NPU in the RK3588 claims a performance of 6 TOPS [Tera-Operations Per Second] across its three cores, though from what I have read, people are having trouble making use of more than one core in parallel, with the closed source driver."

A work-in-progress open source driver for the Rockchip RK3588's NPU delivers great model performance without the binary blob. (📹: Tomeu Vizoso)

Seeking a better and more open experience, Vizoso has been working on an open source driver for the RK3588 NPU — building on reverse-engineering work from Pierre-Hugues Husson and Jasbir Matharu and brought to our attention by CNX Software.

"I am very happy to report that the work has gone really smooth and I reached my first milestone: running the MobileNetV1 model with all convolutions accelerated by the NPU," Vizoso announced of the project late last month. "And it not only runs flawlessly, but at the same performance level as the blob."

The project's latest milestone, announced over the weekend, is the ability to run the SSDLite MobileDet object detection model — specifically tailored for use with low-power mobile-centric accelerator hardware like the NPU in the RK3588 — at 30 frames per second, despite being limited to using just one of the NPU's three cores.

"Now that we got to this level of usefulness, I'm going to switch to writing a kernel driver suited for inclusion into the Linux kernel, to the drivers/accel subsystem," Vizoso writes. "There is still lots of work to do, but progress is going pretty fast, though as I write more drivers for different NPUs I will have to split my time among them. At least, until we get more contributors!"

More information on the effort is available on Vizoso's blog, while the latest source code has been published to his personal branch of the Mesa project.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire:
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles