For those working in machine learning / artificial intelligence, Xilinx’s acquisition of Deephi was rather exciting. Deephi’s technology significantly simplifies the acceleration of deep neural networks in heterogeneous SoCs like the Zynq and Zynq MPSoC.
This is something that is very interesting to me, and over the next few weeks and months, I am going to be several creating Hackster projects that use Deephi’s technology. However, I thought a blog which introduced Deephi’s approach would also be a good idea, just to lay the ground work.
To accelerate deep learning applications for computer vision applications, such as image classification, object detection and tracking, Deephi provides the Deep Neural Network Development Kit (DNNDK). DNNDK is based on C/C++ APIs and allows us to work with common industry standard frameworks, and with popular networks including VGG, ResNet, GoogLeNet, YOLO, SSD, and MobileNet.
At the heart of the DNNDK, which enables the acceleration of the deep learning algorithms, is the deep learning processor unit (DPU). On our Zynq or Zynq MPSoC system, the DPU resides in the programmable logic. To support different deep learning accelerations there are several different DPUs variants which can be implemented.
The basic stages of deploying a AI/ML application in a Zynq / Zynq MPSoC using DNNDK are:
- Compress the neural network model — Takes the network model (prototext), Trained Weights (Caffe) and produces a quantized model which used INT8 representation. To achieve this, a small input training set is also typically required — this contains 100 to 1000 images.
- Compile the neural network model — This generates the ELF files necessary for the DPU instantiations. It will also identify elements of the network which are not supported by the DPU for implementation on the CPU.
- Create the program using DNNDK APIs — With the DPU Kernels created, we can now build the application which manages the inputs and outputs, performs DPU Kernel life cycle management and DPU Task management. During this stage, we also need to implement network elements not supported by the DPU on the CPU.
- Compile the hybrid DPU application — Once the application is ready, we can run the hybrid compiler which will make the CPU code and link it to the ELFs for the DPUs within the programmable logic.
- Run the hybrid DPU executable on our target.
To perform these five steps, DNNDK provides several different tools, which are split between the host and the target.
On the host side, we are offered the following tools:
- DECENT — Deep compression tool, preforms the compression of the network model.
- DNNC — Deep neural network compiler, performs the network compilation. DNNC has a sub component the DNNAS — deep neural network assembler which generates the ELF files for the DPU.
While on the target side:
- N2Cube — This is the DPU run time engine and provides loading of DNNDK applications, scheduling, and resource allocation. Core components of the N2Cube include the DPU driver, DPU loader, and DPU tracer.
- DExplorer — Provides DPU info during run time.
- DSight — Profiling tool that provides visualization data based on Dtracer information.
To get going with the DNNDK, Deephi has several example designs. Running through one of these is really the best place is to start. There are currently reference designs available for the ZCU102, ZCU104, and the Ultra96.
The ZCU102 and ZCU104 examples are designed for high-performance, high-throughput applications. These examples provide performance of up to 7.7 GOPS and 175 frames per second when implementing ResNet.
Meanwhile, the Ultra96 example is designed to show a lower power (edge) application, which has a lower throughput of 25 frames per second for the same ResNet implementation.
Over the next few weeks and months, I will be working with the DNNDK for my Hackster projects. I am sure we will return to it in the Chronicles as well.
See My FPGA / SoC Projects: Adam Taylor on Hackster.io
Get the Code: ATaylorCEngFIET (Adam Taylor)
Access the MicroZed Chronicles Archives with over 260 articles on the Zynq / Zynq MpSoC updated weekly at MicroZed Chronicles.