For this final tutorial, we are going to implement the SHA-256 Compressor into the MicroBlaze-V Block Design. This will used pre-existing System Verilog Files that were created in conjunction with making these tutorials. The files used in this tutorial will be available in the public repo to his project.
Note: This tutorial uses elements from the other tutorials on my account page
To implement this Hardware Accelerator, we will take advantage of Vivado’s IP packager, which allows us to wrap any hardware design around an IP that can be integrated into a block design.
To get started, we will open the project from the previous parts and append this to add in the custom IP. Go to the top bar and click Tools à Create and Package IP
From here, we will choose the Create a new AXI4 Peripheral.
Next, we will choose a name for our custom IP and click next. The next step is to add our AXI 4 Interfaces. For this SHA256 Design, we will use two slave Interfaces, one AXI4 Lite Interface for the Status / Control Registers which will contain 10 Registers, and an AXI Stream that will carry the Data to the Core.
Next, we then will finish creating the IP and select the Edit IP radio switch to open up a new project folder were we can directly edit the IP in its own window without effecting the rest of the design. We can also add testbenches to test the hardware design. Those are included in the git repository for this project.
When the IP Is created, 3 files will be generated, a Verilog file for each interface added, and a wrapper file to link the two interfaces together. For this Project, custom system Verilog files were used instead to match that of the SHA Core. So the Verilog files were removed from the project.
The Files should also be removed from the Source tree and the Verilog Simulation File Group. Once they are removed, the new files can be added from the Git repository.
Next the core hardware files can be added, these get instantiated by the AXI Stream, so they will sit within that in the File Hierarchy
Note: If issues arise with adding sources to design, try going into the project folder manually and deleting the HDL files that is already in the ‘src’ folder.
Once all these files are added, we can package this IP. If you want to test the IP before running it on hardware, you can use the testbenches provided in the repository as a baseline.
We now want to package our IP, to do this we go to the package IP Tab in the project manager and ensure all our files are merged into the design by clicking ‘merge files’ button.
Next, we can simply Re package the IP and this will return us back to the main project.
From here, when we try to add a new IP, our SHA256 Compressor should be there and be able to be used in the design. For this design, we will also be using the AXI DMA in conjunction with the SHA compressor to allow for high data throughput to the compressor.
So, first thing is to add the two IPs into the design, Next, we will configure the DMA, since we are only reading data from DDR, we can disable the write channel. As well as that, we are going to set the buffer length to 26 bits to maximise the amount of data the buffer can hold.
With our two IPs configured, now we can add them to the block design. First we will connect our DMA, to both the MicroBlaze and the DDR using the connection automation process. Here we will configure the M_AXI_MM2S to go to the HPIO interface of the Zynq Soc which is where our DDR memory lies. Everything else can be left default.
Next, we can connect the AXI Lite interface to one of the master interfaces on the Smart Connect IP and can connect the AXI Stream to the Master Axis Interface on the DMA. Then we can run the connection automation to connect the clocks together.
With that the final block design should be complete one thing to check is that all the address are set up correctly in the address editor.
With the design done, the bitstream can now be generated and the xsa file recreated.
SoftwareWe know must rebuild the kernel with the new hardware design. This is quite straight forward since we already complied the kernel last time. To do we transfer the xsa to the folder were we built our Linux build and run the following commands:
petalinux-config --get-hw-description .
petalinux-build
petalinux-package --boot --fpga --u-boot --forceWe then transfer the boot files to the Micro SD Card and insert it back into the Zybo. The PS should be ready then.
There shouldn’t be too much of to change since the last tutorial, only the main driver code must be changed to add the hardware support. The full driver code can be found on the git repo. Here are some snippets that are important:
// Hardware addresses from xparameters.h
#define TIMER_FREQ XPAR_AXI_TIMER_0_CLOCK_FREQUENCY
#define DIGEST_LENGTH 0x10000000
#define DDR_BASE 0x10000004
#define DMA_BASE XPAR_AXI_DMA_0_BASEADDR
#define SHA_IP_BASE XPAR_SHA256_COMPRESSION_0_BASEADDR
#define SHA_IP_DONE SHA_IP_BASE + 4
#define SHA_IP_DIGEST SHA_IP_BASE + 8
#define CHUNK_SIZE 32 * 1024 * 1024 //32MB
// DMA Initialization
XAxiDma AxiDma;
XAxiDma_Config *CfgPtr = XAxiDma_LookupConfig(DMA_BASE);
XAxiDma_CfgInitialize(&AxiDma, CfgPtr);
XAxiDma_Reset(&AxiDma);
while(!XAxiDma_ResetIsDone(&AxiDma));
XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK, XAXIDMA_DMA_TO_DEVICE); //Disable Interupt
// --- HARDWARE HASHING ---
uint32_t remaining_bytes = total_len;
uint32_t current_offset = 0;
int first_block = 1;
Xil_Out32(SHA_IP_BASE, 0x2); // Reset Hardware Digest Registers
XTmrCtr_Reset(&TimerInstance, 0);
XTmrCtr_Start(&TimerInstance, 0);
uint64_t hw_start = ((uint64_t)XTmrCtr_GetValue(&TimerInstance, 1) << 32) | XTmrCtr_GetValue(&TimerInstance, 0);
while(remaining_bytes > 0) {
uint32_t transfer_size = (remaining_bytes > CHUNK_SIZE) ? CHUNK_SIZE : remaining_bytes;
Xil_Out32(SHA_IP_BASE, (first_block ? 0x1 : 0x0));
XAxiDma_SimpleTransfer(&AxiDma, (uintptr_t)(DDR_BASE + current_offset), transfer_size, XAXIDMA_DMA_TO_DEVICE);
//Wait for DMA to
uint32_t dma_status;
do {
dma_status = XAxiDma_ReadReg(DMA_BASE, XAXIDMA_SR_OFFSET);
} while (!(dma_status & XAXIDMA_IDLE_MASK));
if (current_offset > 0) {for(volatile int i = 0; i < 50; i++);} //Nop Delay used to fix hardware bug
remaining_bytes -= transfer_size;
current_offset += transfer_size;
first_block = 0;
}
// Wait for Status Register bits to be 11
// Bit 0 (1): Core Done
// Bit 1 (2): Last Block Done
while ((Xil_In32(SHA_IP_DONE) & 0x3) != 0x3);
uint64_t hw_end = ((uint64_t)XTmrCtr_GetValue(&TimerInstance, 1) << 32) | XTmrCtr_GetValue(&TimerInstance, 0);
XTmrCtr_Stop(&TimerInstance, 0);
uint64_t cycles_hw = hw_end - hw_start;
// Read Hardware Result
uint32_t hash_hw[8];
for (int i = 0; i < 8; i++) hash_hw[i] = Xil_In32(SHA_IP_DIGEST + (4*i));
xil_printf("Hash (Hardware): ");
for (int i = 0; i < 8; i++) xil_printf("%08x", hash_hw[i]);
xil_printf("\r\n");With this Code loaded onto the system, we can run our system_run.py file to send the message to the DDR memory and run the two hashes.
Note: Ensure to update the platform module in Vitis with the new XSA file to get the hardware parameters for the DMA and the SHA Core.
Ass you can see from this result, the SHA core has successfully hashed the message and matches the software implantation. We can also see 63 times increase in performance, which means our SHA core is working correctly.












Comments