AMD Adaptive FPGAs and SoCs (System on Chip) can effectively perform AI inference and use any of the many existing AI models and tools available like TensorFlow, PyTorch, etc.
The heart of AI inference is the DPU (Deep Learning Processor Unit), a very flexible and versatile IP that accelerates the inference in the PL (programmable logic) as opposed to CPU.
The DPU has been renamed NPU (Neural Processor Unit) but keeps the same functionality.
This tutorial is the first of two and will show you how to deploy an AI inference on Zynq Ultrascale+ MPSoC. This part contains the hardware development, with Vivado, and the software development, with Petalinux. The second part is about training, optimizing, pruning, quantizing and compiling the AI model with Vitis AI and finally running it on the actual hardware.
The MYD-CZU3EG/4EV/5EV development boardsMany AI tutorials use boards supported by AMD like KR260, ZCU104, etc. where some base designs exist and can be modified for this purpose.
In this case I have been using the MYIR development board MYD-CZU3EG that contains an XCZU3EG-1SFVC784E device. These dev boards are based on a carrier board and a pluggable SOM with different device options. The alternative devices are XCZU4EV-1SFVC784I and XCZU5EV-2SFVC784I.
Although the ZCU3 is the smallest of the three devices, it is big enough to hold a powerful yet compact AI inference system.
Vivado and Vitis flowsThere are two paths (or "flows") to develop an AI project with AMD, one is the Vivado flow, hardware-centered around Vivado. This is the flow for this tutorial.
The alternative is the "Vitis flow", more software centered. With this flow, the core hardware is packaged as xclbin and executed in the target board.
Because a few bits are used in this project, it's important to check the compatibility between them before starting.
- Vivado and Petalinux: release 2023.1
- DPU IP core: v4.1
- Vitis AI: 3.5
- Host machine Linux OS: Ubuntu 22.04
Create a new Vivado project in the usal way. Here I call it MYD_DPU
On the next windows, select RTL project, Do not specifiy sources, and select part xczu3e-sfvc784-1-e.
This will start an empty project.
Get the DPU IP coreTo use the DPU IP core, because it is not in the Vivado IP catalog, it has to downloaded separately. There are different DPU IPs depending on the target device. The one for Zynq MPSOC is the DPUCZDX8G that can be downloaded from here.
Unzip the downloaded tar.gz file into the same folder:
In the created Vivado project, click on IP Catalog and right click on the panel then select "Add repository"
Browse to the folder with the downloaded and extracted IP. The IP should be recognized.
The IP catalog is now updateed with the DPU IP core ready to be used.
This project is all in a block diagram. A Tcl script is included to recreate it, rather than manually doing it, that would be much longer. However, once created, you are free to look at it in detail and modify it to your needs or even adapt it to another device, project or board.
Copy the provided bd_gen.tcl file into the Vivado project folder:
In Vivado Tcl console, navigate to the project folder and source the Tcl file. In my case is:
cd D:/XilinxWorkspace/Vivado/MYD_DPU
source ./bd_gen.tclThat will create the whole block diagram with the PS and DPU:
It should already be, but you can run "Validate design" (F6)
Have a look at the DPU settingsFind the DPU IP on the block diagram and double click on it.
On the first panel (Arch - Architecture) note the selection of parameters. For more details, to use it in a different project, refer to the DPU user manual. The main point here is the selection of B1600 that is the largest size for the ZCU3 device.
The second tab (Advanced) lets you define additional settings. Here the DSP48 usage is set as High.
On the Sources panel, right click on the bd name and select "Create HDL wrapper" then select "Let Vivado manage wrapper"
Once the wrapper is created, click on Run Implementation. This will also run Synthesis. It will take around half an hour, grab a drink.
Once finished you should get a usage result like this, note that the DSPs are almost fully utilized:
Next click on Generate Bitstream, then on File > Export > Export Hardware and select "Include Bitstream". Leave the default name "bd_wrapper". This will create the XSA file that will be used next in the Petalinux project.
To work on Petalinux, I use a VM with Ubuntu 22.04 and Petalinux 2023.1 installed. It has to be the same release than Vivado used previously.
Petalinux will include VART (Vitis AI Runtime) and a number of recipes are needed. These are in this repository. Here I will clone it, check out the appropriate release and add it to the Petalinux configuration as user yocto layers.
In ~/Downloads do:
git clone https://github.com/Xilinx/meta-vitis
cd meta-vitis/
git checkout rel-v2023.1The process can be checked with git status:
I copy the bd_wrapper.xsa into ~/Petalinux
source the tools and create a project, that I call it myd-dpu:
source /tools/Xilinx/PetaLinux/2023.1/tool/settings.sh
petalinux-create -t project -n myd-dpu --template zynqMP
cd myd-dpu/Configure it from the xsa with:
petalinux-config --get-hw-description=../That will bring the main configuration GUI where some things are to be set.
In Image Package Configuration:
- Change the root filesystem type to EXT4
- Change the Device Node of SD device to /dev/mmcblk1p2
- (Optional) Leave only tar.gz as Root filesystem formats (doesn't generate extra files)
- Disable Copy final images to tftpboot directory
In Yocto settings:
- (Optional) Set the Add pre-mirror url (for offline build)
- (Optional) Set the local sstate feeds settings (for offline build)
- User layers: set the VAI meta-vitis path: /home/joan/Downloads/meta-vitis
Exit accepting changes.
Customizing device tree and rootfs packagesA couple of tweaks are needed, one is to add the entry below to the file <project>/project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi
&sdhci1 {
status = "okay";
xlnx,has-cd = <0x1>;
xlnx,has-power = <0x0>;
xlnx,has-wp = <0x1>;
disable-wp;
no-1-8-v;
};This is to allow the kernel to access the 2nd partition in the SDcard, where the rootfs will be.
The other is to add the following to <project>/project-spec/meta-user/conf/user-rootfsconfig file
CONFIG_vitis-ai-library
CONFIG_vitis-ai-library-dev
CONFIG_vitis-ai-library-dbg
CONFIG_dnf
CONFIG_nfs-utilsKernel configurationRun:
petalinux-config -c kernelAnd the only modification required is to enable the DPU drivers in
Device drivers > Misc Devices > Xilinx Deep Learning Processing Unit (DPU) Driver
Run:
petalinux-config -c rootfsAnd in User packages, enable dnf, nfs-utils, and all the vitis-ai
Build petalinux with:
petalinux-buildAnd create the BOOT.BIN with:
petalinux-package --boot --force --u-boot images/linux/u-boot.elf --pmufw images/linux/pmufw.elf --fsbl images/linux/zynqmp_fsbl.elf --fpga images/linux/system.bitThe result files appear in the images/linux folder:
As usual, an SD card with two partitions is needed:
- A 1 GB FAT partition for the BOOT.BIN, boot.scr and Image
- The rest of the card (~ 7 GB), formatted as ext4 to expand the root filesystem.
The root file system is expanded into the SD card with this command:
sudo tar -xvzf images/linux/rootfs.tar.gz -C /media/joan/rootfsAfter the above has finished, the SD card can be inserted into the MYD-CZU3EG board. The boot switches need to be set accordingly and an USB cable connected to see the boot log over the Uart. After power up, the boot log should appear:
After logging in with the provided username (petalinux) and entering a password, the show_dpu command can be issued (with sudo) to check the presence of the DPU:
xdputil query can also be used to get more details of the implemented DPU:
In this tutorial we have created the HDL project and petalinux software to use a DPU in Zynq MPSOC for AI inference. In the next part, we will see how to prepare a standard AI model with Vitis AI and perform the necessary steps to run it on this system.





Comments