This project demonstrates a custom 2GSPS 4096-point FFT (Fast Fourier Transform) implementation optimized for the AMD® Versal™ AI Engine (AIE) and AI Engine-ML (AIE-ML) architectures. Designed for high-throughput signal processing applications, this FFT leverages the parallelism and vector processing capabilities of the AIE/AIE-ML to deliver exceptional performance with a small footprint.
Whether you're working on radar, wireless communications, or real-time spectral analysis, this 4K FFT example is a powerful building block for your next Versal AI Engine design.
Key Features- 4096-point FFT with support for complex 16-bit integer (cint16) input and complex 32-bit integer (cint32) output
- Optimized for Versal AI Engine and AI Engine-ML vector cores
- High throughput using multiple AIE tiles and PLIO interfaces
- Modular graph-based design using Vitis™ and AI Engine APIs
- Testbench and simulation support for functional verification
- AMD Vitis Unified IDE (2023.2 or later)
- Linux® based development environment
- Python® (for test automation and result validation)
The 4K FFT is implemented as a multi-tile AIE graph, where each tile performs a portion of the FFT computation. The design consists of a 4x1024 transpose operation, a 4-point DFT, twiddle-factor multiplies, and four 1K FFTs from the Vitis DSP Library. A block diagram of the FFT architecture is shown in the figure below.
- Clone the repository
git clone https://gitlab.avnet.com/xilinx/versal/ai-engine/custom_4k_fft
cd custom_4k_fft
- Create a work directory
mkdir work
cd work
- Clone the Vitis Libraries repository
git clone https://github.com/Xilinx/Vitis_Libraries
cd Vitis_Libraries
git checkout <version>
cd ..
Note: replace <version>
in the git checkout
command above with the correct version based on the tool version being used; for example, <version> = 2024.2
for the 2024.2 tool release
- Source the Vitis environment setup script
source <Vitis install path>/settings64.sh
- Create a symbolic link to the test data directory
ln -s ../data
- Compile the AIE Graph for x86 functional simulation
v++ --compile --mode aie --target x86sim --platform xilinx_<board>_base_<version>_1 \
-I ./Vitis_Libraries/dsp/L1/include/aie \
-I ./Vitis_Libraries/dsp/L1/src/aie \
-I ./Vitis_Libraries/dsp/L2/include/aie \
-I ../test \
-I ../source \
--work_dir Work \
../test/testbench.cpp \
--aie.Xpreproc -DFFT_INSTANCES=2
Notes:
1) replace <board>
in the command above with vck190
for AIE or vek280
for AIE-ML.
2) replace <version>
in the command above with the correct platform version based on the tool release being used.
For 2024.1, <version>
= 202410
For 2024.2, <version>
= 202420
For 2025.1, <version>
= 202510
3) The compile command above generates 1 FFT instance and 1 IFFT instance
The number of FFT instances is equal to ceiling(FFT_INSTANCES/2)
The number of IFFT instances is equal to floor(FFT_INSTANCES/2)
- Run x86 functional simulation
x86simulator
- Validate FFT & IFFT results
python3 ../scripts/python/verify_results.py --input data --x86sim_dir x86simulator_output/data/fft0 --direction 0 && \
python3 ../scripts/python/verify_results.py --input data --x86sim_dir x86simulator_output/data/ifft0 --direction 1
The python command will compare the results of the AIE x86 simulation with a python double-precision model. The python validation script should generate the following console output:
AIE x86 functional FFT simulation results
Correlation | Max Magnitude Diff. | Max Relative Error (%)
------------+---------------------+-----------------------
1.0000000 | 80.156 | 0.234
AIE x86 functional IFFT simulation results
Correlation | Max Magnitude Diff. | Max Relative Error (%)
------------+---------------------+-----------------------
1.0000000 | 71.063 | 0.289
A correlation of 1 indicates that the x86 simulation results are geometrically aligned with a python model. The maximum magnitude difference and maximum relative error indicate the magnitude difference between the AIE implementation and the python model. The relative error is less than 1%, which indicates a high-level of accuracy.
Note: the model uses double-precision numerics, whereas the AIE implementation uses cint16
twiddle-factors and cint32
data precision, so there will not be bit-accuracy between the model & the AIE implementation.
- Compile for AIE HW simulation
v++ --compile --mode aie --target hw --platform xilinx_<board>_base_<version>_1 \
-I ./Vitis_Libraries/dsp/L1/include/aie \
-I ./Vitis_Libraries/dsp/L1/src/aie \
-I ./Vitis_Libraries/dsp/L2/include/aie \
-I ../test \
-I ../source \
--work_dir Work \
../test/testbench.cpp \
--aie.Xpreproc -DFFT_INSTANCES=2
Notes:
1) replace <board>
in the command above with vck190
for AIE or vek280
for AIE-ML.
2) replace <version>
in the command above with the correct platform version based on the tool release being used.
For 2024.1, <version>
= 202410
For 2024.2, <version>
= 202420
For 2025.1, <version>
= 202510
3) The compile command above generates 1 FFT instance and 1 IFFT instance
The number of FFT instances is equal to ceiling(FFT_INSTANCES/2)
The number of IFFT instances is equal to floor(FFT_INSTANCES/2)
- Run AIE simulation
aiesimulator
The AIE simulator will generate throughput estimates in MBytes/second as shown in the log excerpt below for a VCK190 (AIE) simulation:
--------------------------------------------------------------------------------
| Intf Type | Port Name | Type | Throughput(MBps) |
--------------------------------------------------------------------------------
| plio | In0_even_fft | IN | 4623.42 |
| | In0_odd_fft | IN | 4621.72 |
| | In0_even_ifft | IN | 4623.42 |
| | In0_odd_ifft | IN | 4621.72 |
| | Out0_0_fft | OUT | 4599.92 |
| | Out0_1_fft | OUT | 4586.40 |
| | Out0_2_fft | OUT | 4592.70 |
| | Out0_3_fft | OUT | 4588.58 |
| | Out0_0_ifft | OUT | 4599.92 |
| | Out0_1_ifft | OUT | 4586.40 |
| | Out0_2_ifft | OUT | 4592.70 |
| | Out0_3_ifft | OUT | 4588.58 |
The average output sample throughput per transform instance is (4 x 4586 MBps) / (8 Bytes/sample) = 2293 MSPS on the VCK190.
PerformanceThe table below summarizes performance and AIE resource usage for the 4K FFT. No PL resources are required in computing the FFT.
- AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
- Python is a registered trademark of the Python Software Foundation
- Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
- Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Comments