High-Level Synthesis is great for implementing algorithms. However, there are times as we develop our HLS IP that we need to think about how it works with the rest of the system beyond the AXI Interfaces which are our main connections.
This can be challenging in HLS as it often means we need to be able to wait on external signals, wait for delays, and generate external signals which change as the algorithm processes.
In this blog, we are going to look at how we can implement structures in our HLS algorithms that:
1. Wait for an input signal as a trigger.
2. Wait for a defined number of clock cycles.
3. Generate an output trigger signal.
Let’s start with waiting for an input trigger from an external IP block. This could be an external frame sync in an image processing application.
The first thing we need to do is define an input using the arbitrary precision unsigned integer type, as we want a single bit. We can use:
We declare this in the parameter list for our HLS function. We can also then declare a HLS pragma, which enables the input to be implemented as an ap_none type.
#pragma HLS INTERFACE ap_none port=trig_in
This now provides us a single bit input into our C function. The next thing we need to do is pause the HLS IP block until we see the trigger.
The simplest way to do this within Vivado HLS is to use a function defined with ap_utils.h that is the ap_wait_until(X) function. This will cause the HLS IP to pause until the variable is true — true in this case means any none zero value.
When we simulate this in our C simulation, we need to ensure the trigger variable is set to a non-zero value; otherwise the simulation will pause.
To ensure the implementation of the ap_wait_until() function is correct within our synthesized HLS code, we can look at the analysis view.
In the analysis view, we should see an operation called _lnXX(wait). Right clicking and selecting goto source should cross probe to the ap_wait_until() function call.
Now we know how to wait for an external trigger in our HLS code, how can we implement a delay for a specific number of clock cycles.
Typically writing our own delay function will not be efficient or implement the delay as we intend. As such the best way to implement a several clock cycle delay is to use the ap_wait_n(X) function.
This function is also defined with the ap_utils.h library. This function will delay at least X number of clock cycles. However, resumption of processing might take a few more clock cycles depending upon the implementation.
The delay can be changed on the fly, and if desired can be controlled using a AXI Lite Interface. For example:
void test (int delay)
#pragma HLS INTERFACE s_axilite port=delay
ap_wait_n(delay); //delay for a number of clock cycles provided over AXI bus
One useful application of the ap_wait_n(X) function is to adjust the frame rate to the exact frame rate you desire, if creating a custom test pattern generator for video applications. In C simulation, this delay will be ignored, though you will notice it in co-simulation when inspecting the waveform.
Similar to as we did with the ap_wait_until() we are able to observe the delay in the analysis view. In this case, we will see a new loop that implements the counter to check for the correct delay.
The final aspect I want to examine is the creation of output signal, which changes state as the HLS IP core runs. This is important as the Vivado HLS compiler, like many C compilers, assumes the function will be single threaded. This means much like in a VHDL process only the final value of the variable is returned. To be able to generate intermediate outputs, we need a little thought.
The first thing we need to do as for the trigger in is to create a single bit variable using the arbitrary precision types.
We declare this in our function parameter list; but since we want the signal to be an output and it is a scalar output, we must declare it is a pointer in accordance with figure 41 in user guide 902.
To ensure the intermediate values are output from the HLS IP function if we define the signal as being volatile. When the HLS compiler runs, the intermediate operations will be performed and not optimized out.
volatile ap_uint<1> *trig_out
Using the volatile keyword enables us to be able to implement outputs like this or to ensure multi access pointers are implemented correctly.
Hopefully now you understand a little more about some of the more specialist features and functions that you deploy in your HLS solutions to make it easier to integrate into your overall solution.
See My FPGA / SoC Projects: Adam Taylor on Hackster.io
Get the Code: ATaylorCEngFIET (Adam Taylor)
Access the MicroZed Chronicles Archives with over 300 articles on the FPGA / Zynq / Zynq MpSoC updated weekly at MicroZed Chronicles.