In this Tutorial, we will continue where we left off and append our hardware design to include DDR Memory and a hardware timer via the AXI Timer IP. On the software side we will implement a sha-256 compression function in C which will use data written to the DDR memory to compute the Hash of that data, and we will time this data with the use of a hardware Timer
Note: This tutorial uses elements from the other tutorials on my account page
We will start off by Opening the Vivado Project we worked on in the previous part and append it to add the new features
Step 2: Adding DDR memoryWith the Vivado Design open, first we will enable a high-performance AXI port in our Zynq block, to do this we double click the Zynq Block and go to the PS-PL configuration, here we will enable the High-Performance AXI Slave port.
With the interface added, we now must add a master interface to our Smart Connect IP Block so we can connect this directly to our HP Slave interface, do this by double clicking the Smart Connect IP Block and setting the Number of Master Interfaces to 2. From there simply click the Connection Automation button to automatically connect the two interfaces together.
We will now add the hardware timer using the AXI Timer IP, we add this by clicking the ‘+’ icon in the block design ribbon and look up ‘AXI Timer’. The AXI timer has two 32-bit registers that can act as separate timers, since we are only going to use one timer, we will enable the 64-bit mode so we can use a timer that can last longer than ~42 Seconds.
We will once again add a master interface to the Smart Connect IP, this will be brought it up to 3 slave interfaces. Once again, we can run the Connection Automation to connect the IP to the Design, the hardware design should look something like this in the end:
We now will assign the DDR Address in the Address editor and ensure there are no conflicts,
We want to use 256Mb of the 512 total Memory for the MicroBlaze-V so we must change the Base Address to 0x1000_0000 and set the range to 256M, make Sure the slave segment Is set to the HP0-DDR-LOWOCM.
The AXI Timer should be auto assigned an address
Once the addresses are assigned correctly, rerun the validate design (F5) and then generate the bitstream once again for Vitis. Export the hardware in Vivado to regenerate the.xsa file.
Software DesignStep 1: Rebuild Platform and Create New ApplicationWe will use the Workspace we used in the last tutorial and simply update the platform to include our new xsa file. We do this by opening the ‘vitis-comp.json’ file in the settings folder and by clicking the reread XSA link. Rebuild the Platform once its built.
Now we will create a new application to implement the sha256 program, we will use the same hello world example we used in the last part.
To have our the MicroBlaze read the message from DDR, we will use Python to generate this input.bin. We will use the first 4 bytes of the input.bin file as the length of the message and the rest of the data will the padded message according to the SHA-256 Standard, here is the python code to get this done.
import struct
from pathlib import Path
import hashlib
def create_blocks_bin(n_total_blocks):
filename = Path(__file__).parent / "input.bin"
message = b'A' * ((64 * n_total_blocks) - 64)
orig_len_bits = len(message) * 8
padded = message + b'\x80'
while (len(padded) % 64) != 56:
padded += b'\x00'
padded += struct.pack(">Q", orig_len_bits)
#Used to swap Byte order from Big Endian to Little Endian for final Output
swapped_data = bytearray()
for i in range(0, len(padded), 4):
word_chunk = padded[i:i+4]
word = struct.unpack(">I", word_chunk)[0]
swapped_data.extend(struct.pack("<I", word))
with open(filename, "wb") as f:
f.write(struct.pack("<I", len(padded)))
f.write(swapped_data)
print("Calculating Golden Hash")
golden_hash = hashlib.sha256(message).hexdigest()
print(f"File created: {filename}")
print(f"Original message size: {len(message)} bytes")
print(f"Padded size: {len(padded)} bytes (Exactly {n_total_blocks} blocks)")
print(f"Golden Hash: {golden_hash}")
if __name__ == "__main__":
create_blocks_bin(300)In this code we simply use the ASCII Character ‘A’ over and over and create however many blocks we need, the hashlib library is also used compare our hash with a golden reference.
Step 3: Write C ProgramWith the input file ready to go, we will use this C code to implement the SHA-256 compression function, as well as our hardware timer to measure the Execution time.
#include <stdint.h>
#include "platform.h"
#include "xil_printf.h"
#include "xtmrctr.h"
#include <xil_cache.h>
//Helper Function for SHA Function
#define ROTR(x,n) (((x) >> (n)) | ((x) << (32 - (n))))
// SHA-256 Constants
static const uint32_t K[64] =
{
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};
// Hardware addresses from xparameters.h
#define TIMER_FREQ XPAR_AXI_TIMER_0_CLOCK_FREQUENCY
#define MESSAGE_HEADER_ADDRESS 0x10000000
#define DDR_BASE 0x10000000 + 4
void sha256Soft(uint32_t ddr_base, uint32_t total_len, uint32_t H[8]) {
// Initial Hash Values
H[0] = 0x6a09e667; H[1] = 0xbb67ae85;
H[2] = 0x3c6ef372; H[3] = 0xa54ff53a;
H[4] = 0x510e527f; H[5] = 0x9b05688c;
H[6] = 0x1f83d9ab; H[7] = 0x5be0cd19;
// Process each Chunk of data - 64 Bytes
for (uint32_t block = 0; block < total_len; block += 64) {
uint32_t W[64];
uint32_t *word_data = (uint32_t *)(uintptr_t)(ddr_base + 4 + block);
for (int i = 0; i < 16; i++) {
W[i] = word_data[i];
}
for (int i = 16; i < 64; i++) {
uint32_t s0 = ROTR(W[i-15], 7) ^ ROTR(W[i-15], 18) ^ (W[i-15] >> 3);
uint32_t s1 = ROTR(W[i-2], 17) ^ ROTR(W[i-2], 19) ^ (W[i-2] >> 10);
W[i] = W[i-16] + s0 + W[i-7] + s1;
}
uint32_t a = H[0], b = H[1], c = H[2], d = H[3];
uint32_t e = H[4], f = H[5], g = H[6], h = H[7];
for (int i = 0; i < 64; i++) {
uint32_t S1 = ROTR(e, 6) ^ ROTR(e, 11) ^ ROTR(e, 25);
uint32_t CH = (e & f) ^ ((~e) & g);
uint32_t temp1 = h + S1 + CH + K[i] + W[i];
uint32_t S0 = ROTR(a, 2) ^ ROTR(a, 13) ^ ROTR(a, 22);
uint32_t MAJ = (a & b) ^ (a & c) ^ (b & c);
uint32_t temp2 = S0 + MAJ;
h = g;
g = f;
f = e;
e = d + temp1;
d = c;
c = b;
b = a;
a = temp1 + temp2;
}
H[0] += a;
H[1] += b;
H[2] += c;
H[3] += d;
H[4] += e;
H[5] += f;
H[6] += g;
H[7] += h;
}
}
int main() {
init_platform();
// Timer Initialization
XTmrCtr TimerInstance;
XTmrCtr_Initialize(&TimerInstance, 0);
XTmrCtr_SetOptions(&TimerInstance, 0, XTC_CASCADE_MODE_OPTION | XTC_AUTO_RELOAD_OPTION);
// Clear Screen
xil_printf("\033[2J\033[H");
xil_printf("\r\n--- SHA-256 Benchmark ---\r\n");
uint32_t total_len = Xil_In32(MESSAGE_HEADER_ADDRESS);
uint32_t hash_sw[8];
Xil_DCacheFlushRange(DDR_BASE, total_len); //Flush Cache to ensure DDR Values are used
XTmrCtr_Reset(&TimerInstance, 0);
XTmrCtr_Start(&TimerInstance, 0);
//Concatenate Registers Together
uint64_t sw_start = ((uint64_t)XTmrCtr_GetValue(&TimerInstance, 1) << 32) | XTmrCtr_GetValue(&TimerInstance, 0);
sha256Soft(DDR_BASE, total_len, hash_sw);
//Concatenate Registers Together
uint64_t sw_end = ((uint64_t)XTmrCtr_GetValue(&TimerInstance, 1) << 32) | XTmrCtr_GetValue(&TimerInstance, 0);
XTmrCtr_Stop(&TimerInstance, 0);
uint64_t cycles_sw = sw_end - sw_start;
xil_printf("Hash (Software): ");
for (int i = 0; i < 8; i++) xil_printf("%08x", hash_sw[i]);
xil_printf("\r\n");
uint32_t ms_sw_whole = (uint32_t)(cycles_sw / (TIMER_FREQ / 1000));
uint32_t ms_sw_frac = (uint32_t)((cycles_sw % (TIMER_FREQ / 1000)) * 1000 / (TIMER_FREQ / 1000));
xil_printf("\r\nExecution Results:\r\n");
xil_printf("Time: %u.%03u ms\r\n", ms_sw_whole, ms_sw_frac);
cleanup_platform();
return 0;
}The driver code knows where to find the addresses we made in Vivado using the generated ‘xparameters.h’ file. Given we don’t have Floating Point Unit enabled on the MicroBlaze-V to save space, we use integer division to calculate the time passed in ms.
Step 4: Getting Input.bin to DDR Memory + Running Program on HardwareThe easiest way to transfer the input.bin file to memory would be using a tcl script using the XSDB API in Vitis, which will allow us to send the bin file to DDR Memory via JTAG.
Note: You can open xsdb in any windows terminal by adding Vitis to the PATH Environment Variables.
Here is the tcl script for running the program:
#Note--SET THIS UP FOR YOUR PROJECT REPO
set path_to_app "C:/PATH/TO/VITIS/PROJECTS/ZyboMicroBlazeV/Sha256App"
set ps7Init_path "$path_to_app/_ide/psinit/ps7_init.tcl"
set elf_path "$path_to_app/build/Sha256App.elf"
set bin_path "input.bin"
catch {connect}
#Set up FPGA
targets -set -filter {name =~ "*xc7z*"}
fpga -file "$path_to_app/_ide/bitstream/design_1_wrapper.bit"
puts "FPGA configuration completed successfully!"
#Run ARM Core Setup
targets -set -filter {name =~ "ARM* #0"}
catch {source $ps7Init_path}
ps7_init
ps7_post_config
#Set MicroBlaze as Target
targets -set -filter {name =~ "Hart*"}
catch {stop}
#Removes BreakPoints to Stop Debug mode from coming on
catch {bpremove -all}
puts "Downloading Data to DDR..."
dow -data $bin_path 0x10000000
puts "Downloding ELF to MicroBlazeV..."
dow $elf_path
puts "--- Starting Execution ---"
conWith this script you won’t need to press the run button in Vitis. To run this in the terminal we do the following steps (Note: May be slightly different depending on if you are on Windows or Linux):
As you can see from the result, we have successfully computed the hash of a message and timed the execution time on the MicroBlaze-V. One limitation right now is that transferring the input.bin file via JTAG is extremely slow and takes a very long time with big input files. So, for the next tutorial, we will set up an Embedded Linux Environment running on the ARM cores of the Zybo Board to enable us to transfer large amounts of data to DDR memory via Ethernet.









Comments