A few years ago I started working on a custom 16-point FFT for the AMD® Versal™ AI Engine (AIE). The goal of the project was to develop skills for creating custom AIE kernels using intrinsic functions and the AIE API. After creating the initial design I was able to use the Vitis™ tools to analyze performance and optimize for the AIE vector processor architecture. I was able to achieve 1.9 GSPS of throughput (complex 16-bit data) using a single AIE tile. I took the concepts learned during the 16-point FFT development and created 8-point, 32-point, and 64-point FFTs as well.
I thought it might be useful to share these designs with the community, so I created a library called FFTs for Fun (FFT4F).
Since the initial release of the code, I've added:
- Inverse FFT support
- Stream I/O support to reduce latency
- 32-bit data type support
- AMD Vitis Unified IDE (2023.2 or later)
- Linux® based development environment
- Python® (for test automation and result validation)
Using the FFT4F library should be pretty straight-forward. Once the FFT4F repository is cloned it can be included in a project, and then an FFT4F subgraph can be instantiated in a system-level AIE graph.
Here's an AIE graph code example that shows how to instantiate a 16-point FFT that uses 2 input/output PLIO pairs.
#ifndef BASIC_GRAPH_H
#define BASIC_GRAPH_H
#include <adf.h>
#include <aie_api/aie.hpp>
#include <aie_api/aie_adf.hpp>
#include "fft4f/include/fft_graph.h"
using namespace adf;
class basic_graph : public graph
{
public:
input_plio in0, in1;
output_plio out0, out1;
basic_graph()
{
in0 = input_plio::create("In0", plio_64_bits, "input_even.txt", 625);
in1 = input_plio::create("In1", plio_64_bits, "input_odd.txt", 625);
out0 = output_plio::create("Out0", plio_64_bits, "output_even.txt", 625);
out1 = output_plio::create("Out1", plio_64_bits, "output_odd.txt", 625);
connect<> (in0.out[0], g_fft.in[0]);
connect<> (in1.out[0], g_fft.in[1]);
connect<> (g_fft.out[0], out0.in[0]);
connect<> (g_fft.out[1], out1.in[0]);
g_fft.place_graph(6, 0); /* Place the graph starting at column 6, row 0 */
}
private:
fft4f::fft_graph< 16, // FFT Size
0x22, // Number of IO
64, // FFT frames per buffer (i.e. batch size)
1, // Number of AIE Tiles to use
0x022, // Scale factor
false, // IFFT flag (true = IFFT, false = FFT)
false, // Use Stream IO flag
cint16, // Input data type
cint16 // Internal & Output data types
> g_fft;
};
#endif
Copy the code above and save it to a file named basic_graph.h.
Next, create a testbench.cpp
file with the following contents:
#include <adf.h>
#include "basic_graph.h"
using namespace adf;
basic_graph dut_graph;
int main(void)
{
dut_graph.init();
dut_graph.run(10); /* Run for 10 iterations (10240 samples) */
dut_graph.end();
return 0;
}
Download the generate_data python script attached to this project, and run the script to generate random test data. On a Linux machine, the script can be executed with the command:
python3 generate_data.py 2 10240
Two files are generated with the command above: input_even.txt and input_odd.txt. Each file contains 5120 complex int16 samples.
Next, make sure to clone the FFT4F repository. The following Linux command can be used:
git clone https://gitlab.avnet.com/xilinx/versal/ai-engine/fft4f
We are now ready to compile the basic_graph for AIE simulation. Here's a few Linux commands to setup the Vitis environment, compile the AIE graph, and run AIE simulation:
source <Vitis tool install directory>/settings.sh
v++ -c --mode aie --target hw --platform xilinx_vck190_base_<version>_1 -I ./ -I fft4f/src --work_dir Work testbench.cpp
aiesimulator
NOTES:
- Replace
<Vitis tool install directory>
with the actual location of the Vitis tool install - Replace
<version>
with the version of the tools being used; for example, 2024.2 would use a<version>
value of202420
.
The AIE simulation will generate throughput measurements and report the MBytes/s processed. Here's an example of the simulator output:
--------------------------------------------------------------------------------
| Intf Type | Port Name | Type | Throughput(MBps) |
--------------------------------------------------------------------------------
| plio | In0 | IN | 4218.16 |
| | In1 | IN | 4215.38 |
| | Out0 | OUT | 3871.75 |
| | Out1 | OUT | 3872.92 |
Each output sample is 4 Bytes (complex int16 data), so the throughput can be converted from Bytes to samples with the calculation:
2 x 3871 MBps / (4 Bytes/sample) = 1935 MSPS
SummaryI hope you find this project & FFT4F library useful. More details on using the FFT4F library along with throughput performance can be found at the project repository landing page at https://gitlab.avnet.com/xilinx/versal/ai-engine/fft4f.
Feel free to drop a comment with questions or suggestions on the FFT4F repo. I hope to expand the repository with new features in the future.
Disclaimers- AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
- Python is a registered trademark of the Python Software Foundation
- Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
- Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Comments