Published September 29, 2025 © MIT

07 Introduction to AMD AI Engine Programming

In this article we are having an introduction on AI Engine programming by analysing an example code

IntermediateProtip1 hour353

07 Introduction to AMD AI Engine Programming

Things used in this project

Software apps and online services

AMD Vitis Unified Software Platform

Story

Introduction

In this article, I wanted to spend some time looking at basic AI Engine programming and at the basic tools for AI Engine compilation and simulation.

To illustrate this article, I am using the project we have built in the previous article and looking only at the AI Engine component.

Note: This tutorial was created using 2025.1. Tool flow may vary in other versions of the tool.

Note 2: You can use the following project to rebuild a AMD Vitis workspace to follow the steps in this tutorial:
https://github.com/xflorentw/AI_Engine_Basic/tree/main/01_Simple_AIE-ML
Run make all to build the workspace

Introduction AI Engine Programming

As a reminder, the AI Engine array is a 2 dimensional array of AI Engine tiles which are connected together using a network of streams. And each tile as a Very Long Instruction Word (VLIW) with single instruction multiple data (SIMD) vector units processors called AI Engine.

Conceptual View of the AI Engine Array

AI Engine programming is done at 2 levels: graph and kernel programming.

Kernels are the fundamental building blocks, they represent the actual compute/processing of the code. They run on a singleAI Engine.

AdaptableData Flow (ADF) graph represent the instantiation and connectivity of these kernels.

Let's take the ai engine component used in the previous article and analyze it to better understand. In this component we can find multiple files:

project.cpp: This file is called the top level file for the AI Engine compiler. It contains the instantiation of the graph. This is also used by the AI Engine simulator to control the simulation
project.h: This is the graph definition file
kernel.cc: This is the kernel source file
kernel.h and include.h are just basic header files

Top Level File

Looking a project.cpp first and first at the top of the file

#include <adf.h>
[...]
using namespace adf;

First we can see that a library adf.h is included and that we are using the namespace adf (which is of course defined in adf.h). This library contains the API for graph programming and kernel interfacing. Thus, you might want to include it in all your AI Engine source files.

Then, in the source file we can see the following line:

simpleGraph mygraph;

This is the instantiation of the graph. This is how the compiler will know what to compiler. So here we are telling the compiler that our AI Engine code consist of one instance of a graph simpleGraph that we will call mygraph.

Note that we could have 2 (or more) graphs completely independent running on the same array defined here.

The last part of the project.cpp source file is a C function called main():

int main(void) {
  mygraph.init();
  mygraph.run(4);
  mygraph.end();
  return 0;
}

This part is not used by the AI Engine compiler. This is only used in AI Engine simulation. What it does is just initializing the graph then running it 4 times.

Analyzing the graph code

Now let's look at the content of project.h. At the top we can see the use of the adf.h library.

Then we have the definition of a simpleGraph class inherited from the adf::graph class

class simpleGraph : public adf::graph {
[...]
};

We can see this as defining a backbox that will run on the AI Engine array

simpleGraph blackbox on top of the AI Engine array

Then we are defining the interfaces of these blackbox:

class simpleGraph : public adf::graph {
[...]
public:
  input_plio  in;
  output_plio out;
  simpleGraph(){
    
    in  = input_plio::create(plio_32_bits, "data/input.txt");
    out = output_plio::create(plio_32_bits, "data/output.txt");
[...]
  }
};

There are 2 interfaces defined: in and out. Both are of type PLIO: this is the interfaces from/to the AI Engine to/from the PL (other available interface type is GMIO to interface the NoC). The width of the PLIO (the size on the PL) is set as 32-bits.

simpleGraph with the PLIO interfaces

Then, there are 2 kernels declared, called first and second, which will run the same function (simple) declared in the same source file kernels/kernels.cc

class simpleGraph : public adf::graph {
private:
  kernel first;
  kernel second;
public:
[...]
  simpleGraph(){
   [...]
    first = kernel::create(simple);
    second = kernel::create(simple);
    [...]
    source(first) = "kernels/kernels.cc";
    source(second) = "kernels/kernels.cc";
    [...]
  }
};

simpleGraph with the 2 kernels, fist and second, added

Then everything is connected together:

class simpleGraph : public adf::graph {
[...]
public:
[...]
  simpleGraph(){
    [...]
    adf::connect(in.out[0], first.in[0]);
    connect(first.out[0], second.in[0]);
    connect(second.out[0], out.in[0]);
    [...]
  }
};

To connect the kernels together and to the PLIOs, the adf::connect API is used. First the input PLIO in is connected to the first input (there is only one in this case) of the kernel first, then the first output of the kernel first is connected to the first input of the kernel second. Finally, the first output of the kernel second is connected to the output PLIO out.

simpleGraph with the connectivity between kernels

Finally, we can see the following code

class simpleGraph : public adf::graph {
[...]
public:
[...]
  simpleGraph()
    [...]
    dimensions(first.in[0]) = { NUM_SAMPLES };
    dimensions(first.out[0]) = { NUM_SAMPLES };
    dimensions(second.in[0]) = { NUM_SAMPLES };
    dimensions(second.out[0]) = { NUM_SAMPLES };
    [...]
    runtime<ratio>(first) = 0.1;
    runtime<ratio>(second) = 0.1;
  }
};

First the dimension is to define the size of the buffer for the kernels (we will see in the kernel code that the kernel simple expect buffers for the input and output). The dimension is set as a number of samples (NUM_SAMPLES declared as per-processor macro).

The we have lines define a runtime ratio (runtime<ratio>) for the kernel. It is good to note that multiple kernels can run on the same tile (i.e. same processor). With the runtime ratio we are informing the compiler how much time (in percentage) we want to allocate to the kernels. In this case, 10% of the total compute time is allocated to each kernel.

Analyzing the kernel code

Now let's look at the kernel code which is in the kernel.cc source file.

Once again, the adf.h library is included to get the basic AI Engine infrastructure APIs.

Then looking at the function prototype:

void simple(adf::input_buffer<cint16> & in, adf::output_buffer<cint16> & out)

What we can see is that the function is declared with one input buffer (adf::input_buffer) and one output buffer (adf::output_buffer) and in these buffers the data will be 16-bit complex elements (thus 32-bit data per sample - 16-bits for the real part and 16-bits for the imaginary part).

Note that, in AI Engine there are 4 types of interfaces possible:

buffers: this is basically using the memory to transmit the data. If the data is coming as a stream (from the PL or another kernel with streaming interface), a DMA is implemented to store the data to the memory. When using buffers, the kernel will be starting only when the correct number of samples have been received.
streams: this is using the stream interface through the AXI-Stream interconnects which are inter-connecting all the tiles. When using streams, the kernel will start directly waiting for the first sample.
cascade: Cascade interface are available to transmit partial result from one AI Engine to a neighboring AI Engine (situated directly at the right or directly below on AIE-ML)
Run Time Parameter (rtp): This is used to send a configuration value from/to the PS. This is a slow interface, not really used to send the main data.

Then the rest of the code is quite standard C code:

void simple(adf::input_buffer<cint16> & in, adf::output_buffer<cint16> & out) {
  cint16 c1, c2;
  cint16* inItr = in.data();
  cint16* outItr = out.data();
  for (unsigned i=0; i<NUM_SAMPLES; i++) {
    c1 = *inItr++;
    c2.real = c1.real+c1.imag;
    c2.imag = c1.real-c1.imag;
    *outItr++ = c2;
  }
}

The kernel is simply circling through the input samples, reading them one by one through the input buffer, computing them one by one and then storing the results one by one.

Note that I have highlighted the one by one expression on purpose. This is not really what you would expect from a Very Long Instruction Word (VLIW) with single instruction multiple data (SIMD) vector units processor.

To take advantage of the VLIW SIMD aspect of the AI Engine we have to use special APIs called AI Engine API or Intrinsics. We will see how we can convert this code to a code supporting VLIW SIMD in a future article.

For more information about AI Engine graph and kernel programming, you might want to read:

the UG1603 if targeting the AIE-ML or AIE-MLv2 architecture:https://docs.amd.com/r/en-US/ug1603-ai-engine-ml-kernel-graph
the UG1079 if targeting the AIE architecturehttps://docs.amd.com/r/en-US/ug1079-ai-engine-kernel-coding

Summary

I hope this article gave you some insight about AI Engine programming. In the next article we will go trough the basic compilation and simulation option and analyze some of the reports generated by the tools to understand the outputs.

Disclaimers

AMD, Versal, and Vitis are trademarks or registered trademarks of Advanced Micro Devices, Inc.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Credits

Florent Werbrouck

20 projects • 27 followers

Passionate about FPGA devices

07 Introduction to AMD AI Engine Programming

Things used in this project

Software apps and online services

Story

Introduction

Introduction AI Engine Programming

Top Level File

Analyzing the graph code

Analyzing the kernel code

Summary

Disclaimers

Code

AI Engine Basic Projects

TE0950 Designs Repository

Credits

Florent Werbrouck

Comments

Embed the widget on your own site

07 Introduction to AMD AI Engine Programming

07 Introduction to AMD AI Engine Programming

Things used in this project

Software apps and online services

Story

Introduction

Introduction AI Engine Programming

Top Level File

Analyzing the graph code

Analyzing the kernel code

Summary

Disclaimers

Code

AI Engine Basic Projects

TE0950 Designs Repository

Credits

Florent Werbrouck

Comments

Related channels and tags