The Fast Fourier Transform (FFT) is a fundamental building block used in DSP systems. Although its algorithm is quite easily understood, the variants of the architectures can be a large time sink for hardware engineers today.
To help DSP engineers working on the AI Engine, AMD is providing the DSP Library as a part of the Vitis Libraries repository which is an Open Source repository provided on GitHub:
https://github.com/Xilinx/Vitis_Libraries/tree/main/dsp
You can also find the comprehensive documentation here:
https://docs.amd.com/r/en-US/Vitis_Libraries/dsp/index.html
The DSP library is provided with multiple layers:
- L1 are the basic kernels
- L2 are sub-graphs (which can be called from a graph) which are running on 1 to multiple AI Engine tiles
The recommendation is to work with the L2 elements which give a higher level of abstraction.
As part of the DSP Library, we can find an FFT optimized for the various AI Engine architecture (AIE, AIE-ML and AIE-MLv2).
In this tutorial we will show how we can implement a 1024-point FFT using the DSP Library.
AMD Vitis™ AI Engine ComponentThe elements from the Vitis Library can be called from AI Engine graph code.
The first step is then to create a new AI Engine component in the Vitis Unified IDE (File > New Component > AI Engine). We can call this component fft_1024, leave it as an empty component (not adding any source file) for the moment and targeting the xcve2302-sfva784-1LP-e-S part which is the one on the Trenz TE0950.
Then we have to create our source files. We could create them using a simple text editor and write all the content manually. But here I will use a feature from the Vitis Unified IDE that can create a template based on some parameters.
When you have a new AI Engine component with no source file, you have an option in the component settings to GenerateAIEPrototypeCode.
In Generate AIE Prototype Code window I am configuring the graph name, changing the data types to 16-bit complex integer (cint16) and enable Generate Top Level graph and Simulation code.
Note: We will not need the kernel code as we will call the FFT from the DSP Library but this will give us a good reference for the the graph and top level file.
Once we click on Generate we have the new source files added to the component and the Top-level file set for our component.
As a sanity check we can run the x86 or AI Engine compiler to verify that the generated code is valid. I am getting a successful build on both so moving forward.
Note: At this point, I am removing the 2 kernel source and header files my_kernel.cpp/.h as I will not be using them.AMD DSP Library
As mentioned in the DSP library is available as part of the AMD Vitis Libraries repository on GitHub. To use it you will need to clone the repository on your machine.
You can directly clone it from a terminal using the following link:
https://github.com/Xilinx/Vitis_Libraries.git
Or you can clone the library directly from the Vitis Unified IDE, from the Libraries section, click on the download icon on the Vitis Accelerated Libraries repository line
if you click on the pen icon on the same line, you can see where the library was downloaded. You can also use this window to change where the libraries are downloaded or get a different branch.
Then from our AI Engine component we need to point to 3 folders of the DSP Library:
- <download path>/vitis_libraries/dsp/L2/include/aie
- <download path>/vitis_libraries/dsp/L1/include/aie
- <download path>/vitis_libraries/dsp/L1/src/aie
This can be done from the aiecompiler.cfg file for our AI Engine component
Now that the tool is set up to use the AMD DSP library we can call the DSP Library from our graph.
For that I have used the documentation which includes an example that I have adapted for our use case.
There are the different pages of the documentation you might want to look at when implementing the FFT from the DSP Library:
- The overview of the classhttps://docs.amd.com/r/en-US/Vitis_Libraries/dsp/rst/class_xf_dsp_aie_fft_dit_1ch_fft_ifft_dit_1ch_graph.html_0
- The code example for the FFT:https://docs.amd.com/r/en-US/Vitis_Libraries/dsp/user_guide/L2/func-fft-ifft-aie-only.html_7
First, in graph_FFT_1024.h, we can remove most of the lines related to the kernel to keep on the following code:
#include <adf.h>
using namespace adf;
class my_graph : public graph {
public:
input_plio in;
output_plio out;
my_graph() {
in = input_plio::create(plio_64_bits, "data/input.txt");
out = output_plio::create(plio_64_bits, "data/output.txt");
// TODO change connectivity to FFT
connect<>(in.out[0], k.in[0]);
connect<>(k.out[0], out.in[0]);
}
};Note: I am also changing the PLIO to 64 bit interfaces as we have seen in a previous tutorial that we could get better performance with this configuration.
Then we can call the header file for the FFT
#include "fft_ifft_dit_1ch_graph.hpp"Then we just need to set a minimal set of parameters which are the sample datatype (DATA_TYPE_FFT), the twiddle datatype (TWIDDLE_TYPE), the point size of the FFT (POINT_SIZE), if the FFT is a FFT or inverse FFT (TP_FFT_NIFFT ) and the output shift (TP_SHIFT).
#define DATA_TYPE_FFT cint16
#define TWIDDLE_TYPE cint16
#define POINT_SIZE 1024
#define TP_FFT_NIFFT 1
#define TP_SHIFT 10Then we can instantiate the FFT graph
xf::dsp::aie::fft::dit_1ch::fft_ifft_dit_1ch_graph<DATA_TYPE_FFT, TWIDDLE_TYPE, POINT_SIZE,TP_FFT_NIFFT,TP_SHIFT> fft_1024;And finally connect the FFT directly to the PLIOs of our graph:
connect<>(in.out[0], fft_1024.in[0]);
connect<>(fft_1024.out[0], out.in[0]);This is all we have to do to implement our 1024 point FFT
Running AI Engine Compiler and Analyzing the outputNow that we have implemented the 1024 point FFT inside our graph we can run the AI Engine compiler to verify that our code is correct and check the hardware implementation
Note: I have actually run the compiler targeting the X86 simulation first to verify that my code is correct as this is what I have recommended in a previous article ;). I am moving to AI Engine compiler as this built with no issue.
From the graph report view we can see that the FFT was implemented using 1 tile. We can see that multiple buffers are also connected to the kernel to hold the values for the twiddle data. There are also piing/pong buffers connected at each side of the FFT.
Then looking at the array view, we can see that our graph is taking 1 FFT for the compute but also 2 others for the compute.
The compiler is usually spreading the buffers as it is trying to achieve the best performances without "understanding" the design. This is probably something that we can improve through a future article.
Note 2: You can use the following project to rebuild an AMD Vitis workspace to get the final version of the project after the steps mentioned in this tutorial: https://github.com/xflorentw/AI_Engine_Basic/tree/main/02_FFT_AIE-MLSummary
Runmake allBefore running the command, you will need to clone the Vitis_Libraries repository from GitHub and set an environment variable DSPLIB_ROOT to Vitis_Libraries/dsp
In this tutorial we have seen how to instantiate a 1024-point FFT for the AIE-ML using the DSP Library. The next step for us will be to simulate it. For this we will use a Python test bench to generate stimuli and golden data. This is what I will show in the next tutorial.
If you are looking for more a more advanced FFT example, you might want to look at this example from Tom Simpson:
https://www.hackster.io/dsp2/amd-versal-ai-engine-2-gsps-4k-point-fft-11ab7d
Disclaimers- AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
- Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.






Comments