-----------------------------------------------------------------------------------------------------
Schematic: https://www.avnet.com/opasdata/d120001/medias/docus/193/Ultra96-V2%20Rev1%20Schematic.pdf
Board files: https://github.com/Avnet/bdf
-----------------------------------------------------------------------------------------------------
Objective
The objective of this module is to explore, create and apply an HLS IP kernel and inject that IP into the PYNQ framework. PYNQ is a framework that allows the user to interface with hardware functions & foundations, from high-level languages such as python to the FPGA. We are creating a baseline project which allows users to interface both with DMA and custom IP. Understanding the pathway of the IP process will help understand Jupyter notebook project. This project in other words is a placeholder for more customizable IP you create. The base module is a multiple stream function below:
A * B = C
The design involves 2 axi-interconnects, one managing the axi-master for both DMA and custom IP, the second converting memory to stream (m2ss) --> (s2mm). The reason being is because we are using streaming data allowing the raw data to be mapped to the stream, and inversely once the data has been outputted. We will cover this module as we continue below.
The software (PYNQ) API extracts the setup drivers beautifully for this making the process easy and simple to use. First, you create the lanes and pathway for the data to connect (Send & Receive) then wait for data to enter and leave. Refer to resources resize_example_pl:
[] https://github.com/Xilinx/PYNQ-HelloWorld/blob/master/pynq_helloworld/notebooks/edge/resizer_pl.ipynb) resize_pl_example
HW - Creating Block DiagramFrom a high level, there are 2 inputs with 1 output. The goal is for the hardware is to allow the data to streamline through the design. The configuration is crucial in order to learn best practices of ultrafast Methodology. Understanding board/device planning, design creation, implementation, and design closures. The diagram hopefully consolidates the items mentioned earlier.
The process here illustrates the functions that are required for the application to execute. The most challenging piece to this puzzle is how to insert the custom IP in addition to verifying the compatibility with the design protocols.
Using an existing model (the resize_ip), there is a clear correlation between the architecture and procedure of events. Refer to the PYNQ documentation to inquire more.
-----------------------------------------------------------------------------------------------------
HLS Flow Guide- This project assumes you have the board files downloaded and pointed
- Create a project (Vitis HLS/ Vivado HLS)
- Name project and directory workspaceDo not add Design or Testbench files. Choose a solution name and select the part as shown below. Select “Finish”.
Under Part selection browse for corresponding board [Ultra96_v2]
Right-click under project and to new sources.
- Write your code here, the example below demonstrates a stream multiply C++ function for multiplication
- Synthesize the design and wait for reports to generate
- Navigate to the solution's menu and select "Export RTL"
-----------------------------------------------------------------------------------------------------
RTL Flow GuideIt's a powerful step allowing the software to create the hardware, this accelerates design process tremendously, either if you are a software engineer or new to RTL design. Touching hardware can be difficult even for veteran engineers, there is always a route, pin, placement that has to be touched up. Using C++ you can alleviate that uncertainty. Now let's stitch in the IP to the project you will make now.
- Open Vivado 2020.2 (or higher) and select "Create Project"
- Name project and directory workspace
- Under Part, selection browse for corresponding board [Ultra96_v2]
- Create a Block Design
- Add the new IP, and press OK
Now when you search for your multiply IP, it can be found
- Run Block Automation & apply Board Presets
- Double Click UltraScale+ ZYNQ block and apply some modification, enable HP ports
- Insert 2 axi interconnects on either side of the middle of our black box
- Confirm the Processor has been reset & hooked up to the design
- Add a DMA block and verify the functions below. Note - Has to be 32 bits for multiply block to work successfully
- Add your new IP mult_constant_hls block!
- Refer to the block and ensure the routes on the block design are verbatim
- Create a hierarchy by holding control and selecting each block, right-click and create hier_0
- Axi-interconnect 1 (left side) controls the block configurations | Axi-interconnect 2 (right) follows the data path (Click the right arrow to see the difference)
- Refer to Address Editor and assign all - this maps memory addresses to allocated IP in the design
- Verify design and confirm it has verified
- Final Block Design
- Congratulations you have successfully created the HW portion of the project🥳
-----------------------------------------------------------------------------------------------------
SW Flow GuideCode HLS [C++]#include "ap_axi_sdata.h"
#include "hls_stream.h"
typedef ap_axis<32,0,0,0> pkt_t;
void mult_constant_hls(
hls::stream< pkt_t > &din,
hls::stream< pkt_t > &dout,
ap_int<32> multiplier) {
#pragma HLS INTERFACE s_axilite port=multiplier
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE axis port=din
#pragma HLS INTERFACE axis port=dout
pkt_t pkt;
din.read(pkt);
pkt.data *= multiplier;
dout.write(pkt);
}
Code Jupyter Notebook [Python]Imports
from pynq import Overlay
import time
from pynq import allocate
from pynq.lib.dma import DMA
import numpy as np
import PIL
# from PIL import Image
from IPython.display import Image
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from pynq import allocate, Overlay
Load Bitstream and Overlays
# Load Bitstream (HW/HLS/Vitis)
# overlay = Overlay('design_5_mult.bit')
overlay = Overlay('design_5_mult.bit')
overlay?
#DMA Initialize API
dma = overlay.hier_0.axi_dma_0
#boxfilter
# boxF = overlay.hier_0.boxfilter_accel_0
#Multiplier IP
mult = overlay.hier_0.mult_constant_hls_0
Stream Multiplier Stream IPThe initialization of the blocks (dma & mult) are now operational. Below we are looking at the register map of a certain unit (e.g. mult) and now creating an input buffer & output buffer. Providing the IP and indicating the multiplier of X Value (in this case you are multiplying by a factor of 5 - the array will be multiplied by 5), you can now create an array for the user, and thus because you created the stream link. Think of this as a hardware operation where you "hardcode" in hardware.
#Find the register mapped input
mult.register_map
RegisterMap {
multiplier = Register(multiplier=0)\
}
#Create an input & output buffer - This creates the route lane to become active in the HW
in_buffer = allocate(shape=(6,), dtype=np.uint8)
out_buffer = allocate(shape=(6,), dtype=np.uint8)
#Assign a multiplier value of 5 as an example
mult.register_map.multiplier = 5
#Create a basic Array
for i in range(6):
in_buffer [i] = i
#Print the Buffer
in_buffer
PynqBuffer([0, 1, 2, 3, 4, 5], dtype=uint8)
#Allow the magic to occur - Sends data to multplier kernel and then captures the receive - Waits for transactions to occur and complete
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()
#Print new Values
out_buffer
#Prints New Values from Kernel HLS Block
PynqBuffer([ 0, 5, 10, 15, 20, 25], dtype=uint8)
Output HLS overlayBuffer = [ 0, 1, 2, 3, 4, 5]
Sent to DMA, IP/Kenrel Operational
newBuffer = [0, 5, 10, 15, 20, 25]
Comments