Published July 14, 2025 © Apache-2.0

High-performance small FFT library for Versal AI Engine

The FFTs 4 Fun (FFT4F) library delivers optimized small-size FFTs (8–64 points) for AMD Versal AI Engines

BeginnerProtip162

High-performance small FFT library for Versal AI Engine

Things used in this project

Hardware components

AMD Versal AI Core

Software apps and online services

AMD Vitis Unified Software Platform

Story

Introduction

A few years ago I started working on a custom 16-point FFT for the AMD® Versal™ AI Engine (AIE). The goal of the project was to develop skills for creating custom AIE kernels using intrinsic functions and the AIE API. After creating the initial design I was able to use the Vitis™ tools to analyze performance and optimize for the AIE vector processor architecture. I was able to achieve 1.9 GSPS of throughput (complex 16-bit data) using a single AIE tile. I took the concepts learned during the 16-point FFT development and created 8-point, 32-point, and 64-point FFTs as well.

I thought it might be useful to share these designs with the community, so I created a library called FFTs for Fun (FFT4F).

Since the initial release of the code, I've added:

Inverse FFT support
Stream I/O support to reduce latency
32-bit data type support

Requirements

AMD Vitis Unified IDE (2023.2 or later)
Linux® based development environment
Python® (for test automation and result validation)

Usage

Using the FFT4F library should be pretty straight-forward. Once the FFT4F repository is cloned it can be included in a project, and then an FFT4F subgraph can be instantiated in a system-level AIE graph.

Here's an AIE graph code example that shows how to instantiate a 16-point FFT that uses 2 input/output PLIO pairs.

#ifndef BASIC_GRAPH_H
#define BASIC_GRAPH_H

#include <adf.h>
#include <aie_api/aie.hpp>
#include <aie_api/aie_adf.hpp>
#include "fft4f/include/fft_graph.h"

using namespace adf;

class basic_graph : public graph
{
  public:
    input_plio  in0, in1;
    output_plio out0, out1;

    basic_graph()
    {
      in0  = input_plio::create("In0", plio_64_bits, "input_even.txt", 625);
      in1  = input_plio::create("In1", plio_64_bits, "input_odd.txt", 625);
      out0 = output_plio::create("Out0", plio_64_bits, "output_even.txt", 625);
      out1 = output_plio::create("Out1", plio_64_bits, "output_odd.txt", 625);

      connect<> (in0.out[0], g_fft.in[0]);
      connect<> (in1.out[0], g_fft.in[1]);
      connect<> (g_fft.out[0], out0.in[0]);
      connect<> (g_fft.out[1], out1.in[0]);

      g_fft.place_graph(6, 0); /* Place the graph starting at column 6, row 0 */
    }

  private:
    fft4f::fft_graph< 16,     // FFT Size
                      0x22,   // Number of IO 
                      64,     // FFT frames per buffer (i.e. batch size)
                      1,      // Number of AIE Tiles to use
                      0x022,  // Scale factor
                      false,  // IFFT flag (true = IFFT, false = FFT)
                      false,  // Use Stream IO flag 
                      cint16, // Input data type
                      cint16  // Internal & Output data types
                    > g_fft;
};

#endif

Copy the code above and save it to a file named basic_graph.h.

Next, create a testbench.cpp file with the following contents:

#include <adf.h>
#include "basic_graph.h"

using namespace adf;

basic_graph dut_graph;

int main(void)
{
  dut_graph.init();
  dut_graph.run(10);  /* Run for 10 iterations (10240 samples) */
  dut_graph.end();
  return 0;
}

Download the generate_data python script attached to this project, and run the script to generate random test data. On a Linux machine, the script can be executed with the command:

python3 generate_data.py 2 10240

Two files are generated with the command above: input_even.txt and input_odd.txt. Each file contains 5120 complex int16 samples.

Next, make sure to clone the FFT4F repository. The following Linux command can be used:

git clone https://gitlab.avnet.com/xilinx/versal/ai-engine/fft4f

We are now ready to compile the basic_graph for AIE simulation. Here's a few Linux commands to setup the Vitis environment, compile the AIE graph, and run AIE simulation:

source <Vitis tool install directory>/settings.sh

v++ -c --mode aie --target hw --platform xilinx_vck190_base_<version>_1 -I ./ -I fft4f/src --work_dir Work testbench.cpp

aiesimulator

NOTES:

Replace <Vitis tool install directory> with the actual location of the Vitis tool install
Replace <version> with the version of the tools being used; for example, 2024.2 would use a <version> value of 202420.

The AIE simulation will generate throughput measurements and report the MBytes/s processed. Here's an example of the simulator output:

--------------------------------------------------------------------------------
| Intf Type   | Port Name                          | Type  | Throughput(MBps)  |
--------------------------------------------------------------------------------
| plio        | In0                                | IN    | 4218.16           |
|             | In1                                | IN    | 4215.38           |
|             | Out0                               | OUT   | 3871.75           |
|             | Out1                               | OUT   | 3872.92           |

Each output sample is 4 Bytes (complex int16 data), so the throughput can be converted from Bytes to samples with the calculation:

2 x 3871 MBps / (4 Bytes/sample) = 1935 MSPS

Summary

I hope you find this project & FFT4F library useful. More details on using the FFT4F library along with throughput performance can be found at the project repository landing page at https://gitlab.avnet.com/xilinx/versal/ai-engine/fft4f.

Feel free to drop a comment with questions or suggestions on the FFT4F repo. I hope to expand the repository with new features in the future.

Disclaimers

AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
Python is a registered trademark of the Python Software Foundation
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

generate_data

import sys
import random

def write_array_to_file(filename, array):
    with open(filename, 'w') as f:
        for i in range(0, len(array), 4):
            line = ' '.join(str(x) for x in array[i:i+4])
            f.write(line + '\n')

def main():
    if len(sys.argv) != 3:
        print("Usage: python generate_data.py <number_of_input_plio> <number_of_samples>")
        return

    try:
        num_outputs = int(sys.argv[1])
        num_samples = 2*int(sys.argv[2])
    except ValueError:
        print("Both arguments must be integers.")
        return

    random.seed(int(0))
    data = [random.randint(-16383, 16384) for _ in range(num_samples)]

    if num_outputs == 2:
        even_index_data = []
        odd_index_data = []
        toggle = True
        for i in range(0, len(data), 2):
            chunk = data[i:i+2]
            if toggle:
                even_index_data.extend(chunk)
            else:
                odd_index_data.extend(chunk)
            toggle = not toggle
        write_array_to_file("input_even.txt", even_index_data)
        write_array_to_file("input_odd.txt", odd_index_data)
    elif num_outputs == 1:
        write_array_to_file("input.txt", data)
    else:
        print("Invalid number of input PLIO. Please specify 1 or 2.")

if __name__ == "__main__":
    main()

Credits

Tom Simpson

7 projects • 84 followers

DSP & Versal AI Engine specialist at Avnet

High-performance small FFT library for Versal AI Engine

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Requirements

Usage

Summary

Disclaimers

Code

generate_data

Credits

Tom Simpson

Comments

Embed the widget on your own site

High-performance small FFT library for Versal AI Engine

High-performance small FFT library for Versal AI Engine

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Requirements

Usage

Summary

Disclaimers

Code

generate_data

Credits

Tom Simpson

Comments

Related channels and tags