Jasbir Matharu's Work Paves the Way to an Open Source Toolchain for Allwinner's V831 NPU Accelerator
Coder responds to Sipeed's call to arms and sets about reverse engineering the neural processing unit, unlocking its potential.
Tiny Device's Jasbir Matharu has published the result of an effort, proposed by Sipeed, to reverse engineer the Neural Processing Unit (NPU) on Allwinner's V831 — in order to make it available for use in open source software.
"We are reversing [the Allwinner] V831's NPU register, and mak[ing an] open source AI toolchain based on NCNN," Sipeed wrote on Twitter in October of last year, having designed its MAIX-II Dock computer vision product around the part. "The NPU IP isn't Allwinner's, and they haven't made a usable AI toolchain yet, so we have to reverse it."
The company sought community help in reverse engineering the accelerator, which provides 0.2 TOPS of compute performance for neural network workloads, offering boards to anyone looking to help. Tiny Device's Matharu was one such volunteer — and considerable progress has been made in getting the device usable.
"After many months of trial and error, endless deciphering of data dumps, a few dead ends and numerous reverse engineering attempts," Matharu writes in his follow-up, "parts of the NPU operations have been decoded. Fundamentally a large portion of the NPU is a customized implementation of [the] NVIDIA Deep Learning Accelerator (NVDLA) architecture."
"So far I have decoded the CONV, SDP and PDP units which allow for the following operations (tested with int8 data type): Direct convolutions; Bias addition; Relu/Prelu; Element wise operations; Max/Average pooling. Furthermore I have managed to removed all dependencies on closed Allwinner libraries, this is partially achieved by implementing a simple ION memory allocation utility."
Matharu's work reveals that the NPU supports, at least, int8 and int16 data types, though fp16 support is as-yet unconfirmed, and that it can be clocked in software from 100MHz to 1,200MHz with a 400MHz default. The NPU also uses the NV Small model, sharing the memory bus with the CPU and using system memory for all operations — putting a limit on its performance.
Matharu's full write-up is available on the Tiny Devices blog, while work-in-progress source code can be found on GitHub under the reciprocal GNU General Public License 3.