Published July 16, 2025 © Apache-2.0

AMD Versal AI Engine 2 GSPS 4K-point FFT

Custom 2 GSPS 4K FFT optimized for AMD Versal AI Engine and AI Engine ML

BeginnerFull instructions provided1 hour761

AMD Versal AI Engine 2 GSPS 4K-point FFT

Things used in this project

Hardware components

AMD VCK190

AMD VEK280

Software apps and online services

AMD Vitis Unified Software Platform

AMD Vitis Libraries

Story

Overview

This project demonstrates a custom 2GSPS 4096-point FFT (Fast Fourier Transform) implementation optimized for the AMD® Versal™ AI Engine (AIE) and AI Engine-ML (AIE-ML) architectures. Designed for high-throughput signal processing applications, this FFT leverages the parallelism and vector processing capabilities of the AIE/AIE-ML to deliver exceptional performance with a small footprint.

Whether you're working on radar, wireless communications, or real-time spectral analysis, this 4K FFT example is a powerful building block for your next Versal AI Engine design.

Key Features

4096-point FFT with support for complex 16-bit integer (cint16) input and complex 32-bit integer (cint32) output
Optimized for Versal AI Engine and AI Engine-ML vector cores
High throughput using multiple AIE tiles and PLIO interfaces
Modular graph-based design using Vitis™ and AI Engine APIs
Testbench and simulation support for functional verification

Requirements

AMD Vitis Unified IDE (2023.2 or later)
Linux® based development environment
Python® (for test automation and result validation)

Architecture

The 4K FFT is implemented as a multi-tile AIE graph, where each tile performs a portion of the FFT computation. The design consists of a 4x1024 transpose operation, a 4-point DFT, twiddle-factor multiplies, and four 1K FFTs from the Vitis DSP Library. A block diagram of the FFT architecture is shown in the figure below.

How to Build & Run

Clone the repository

git clone https://gitlab.avnet.com/xilinx/versal/ai-engine/custom_4k_fft
cd custom_4k_fft

Create a work directory

mkdir work
cd work

Clone the Vitis Libraries repository

git clone https://github.com/Xilinx/Vitis_Libraries
cd Vitis_Libraries
git checkout <version>
cd ..

Note: replace <version> in the git checkout command above with the correct version based on the tool version being used; for example, <version> = 2024.2 for the 2024.2 tool release

Source the Vitis environment setup script

source <Vitis install path>/settings64.sh

Create a symbolic link to the test data directory

ln -s ../data

Compile the AIE Graph for x86 functional simulation

v++ --compile --mode aie --target x86sim --platform xilinx_<board>_base_<version>_1 \
-I ./Vitis_Libraries/dsp/L1/include/aie \
-I ./Vitis_Libraries/dsp/L1/src/aie \
-I ./Vitis_Libraries/dsp/L2/include/aie \
-I ../test \
-I ../source \
--work_dir Work \
../test/testbench.cpp \
--aie.Xpreproc -DFFT_INSTANCES=2

Notes:
1) replace <board> in the command above with vck190 for AIE or vek280 for AIE-ML.

2) replace <version> in the command above with the correct platform version based on the tool release being used.
For 2024.1, <version> = 202410
For 2024.2, <version> = 202420
For 2025.1, <version> = 202510

3) The compile command above generates 1 FFT instance and 1 IFFT instance
The number of FFT instances is equal to ceiling(FFT_INSTANCES/2)
The number of IFFT instances is equal to floor(FFT_INSTANCES/2)

Run x86 functional simulation

x86simulator

Validate FFT & IFFT results

python3 ../scripts/python/verify_results.py --input data --x86sim_dir x86simulator_output/data/fft0 --direction 0 && \
python3 ../scripts/python/verify_results.py --input data --x86sim_dir x86simulator_output/data/ifft0 --direction 1

The python command will compare the results of the AIE x86 simulation with a python double-precision model. The python validation script should generate the following console output:

AIE x86 functional FFT simulation results
Correlation | Max Magnitude Diff. | Max Relative Error (%)
------------+---------------------+-----------------------
  1.0000000 | 80.156              | 0.234


AIE x86 functional IFFT simulation results
Correlation | Max Magnitude Diff. | Max Relative Error (%)
------------+---------------------+-----------------------
  1.0000000 | 71.063              | 0.289

A correlation of 1 indicates that the x86 simulation results are geometrically aligned with a python model. The maximum magnitude difference and maximum relative error indicate the magnitude difference between the AIE implementation and the python model. The relative error is less than 1%, which indicates a high-level of accuracy.

Note: the model uses double-precision numerics, whereas the AIE implementation uses cint16 twiddle-factors and cint32 data precision, so there will not be bit-accuracy between the model & the AIE implementation.

Compile for AIE HW simulation

v++ --compile --mode aie --target hw --platform xilinx_<board>_base_<version>_1 \
-I ./Vitis_Libraries/dsp/L1/include/aie \
-I ./Vitis_Libraries/dsp/L1/src/aie \
-I ./Vitis_Libraries/dsp/L2/include/aie \
-I ../test \
-I ../source \
--work_dir Work \
../test/testbench.cpp \
--aie.Xpreproc -DFFT_INSTANCES=2

Run AIE simulation

aiesimulator

The AIE simulator will generate throughput estimates in MBytes/second as shown in the log excerpt below for a VCK190 (AIE) simulation:

--------------------------------------------------------------------------------
| Intf Type   | Port Name                          | Type  | Throughput(MBps)  |
--------------------------------------------------------------------------------
| plio        | In0_even_fft                       | IN    | 4623.42           |
|             | In0_odd_fft                        | IN    | 4621.72           |
|             | In0_even_ifft                      | IN    | 4623.42           |
|             | In0_odd_ifft                       | IN    | 4621.72           |
|             | Out0_0_fft                         | OUT   | 4599.92           |
|             | Out0_1_fft                         | OUT   | 4586.40           |
|             | Out0_2_fft                         | OUT   | 4592.70           |
|             | Out0_3_fft                         | OUT   | 4588.58           |
|             | Out0_0_ifft                        | OUT   | 4599.92           |
|             | Out0_1_ifft                        | OUT   | 4586.40           |
|             | Out0_2_ifft                        | OUT   | 4592.70           |
|             | Out0_3_ifft                        | OUT   | 4588.58           |

The average output sample throughput per transform instance is (4 x 4586 MBps) / (8 Bytes/sample) = 2293 MSPS on the VCK190.

Performance

The table below summarizes performance and AIE resource usage for the 4K FFT. No PL resources are required in computing the FFT.

Learn More

GitLab Project Page

Disclaimers

AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
Python is a registered trademark of the Python Software Foundation
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Tom Simpson

7 projects • 84 followers

DSP & Versal AI Engine specialist at Avnet

AMD Versal AI Engine 2 GSPS 4K-point FFT

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Key Features

Requirements

Architecture

How to Build & Run

Performance

Learn More

Disclaimers

Credits

Tom Simpson

Comments

Embed the widget on your own site

AMD Versal AI Engine 2 GSPS 4K-point FFT

AMD Versal AI Engine 2 GSPS 4K-point FFT

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Key Features

Requirements

Architecture

How to Build & Run

Performance

Learn More

Disclaimers

Credits

Tom Simpson

Comments

Related channels and tags