I originally published this project in my blog. Here I will explain each step in more details.
Hardware acceleration at the edge is revolutionising how we approach computer vision, machine learning, and high-performance computing. However, bridging the gap between software development and FPGA hardware can often feel like a daunting task.
In this comprehensive guide, I will demystify the process by walking through how to build, deploy, and run a custom hardware accelerator application on the AMD Kria™ KV260 Vision AI Starter Kit. To achieve this, we will be leveraging the cutting-edge capabilities of the AMD Vitis™ Unified Software Platform 2025.2.
The Hello World of Hardware AccelerationTo keep things practical and focused on the toolchain mechanics, we will use the Simple Vector Addition (vadd) accelerator example provided natively within Vitis. It is the perfect 'Hello World' for hardware acceleration, allowing us to focus entirely on mastering the deployment pipeline without getting bogged down by overly complex algorithmic logic.
The complete end-to-end development workflow is broken down into four manageable, bite-sized stages:
- Preparing the KV260 SD Card: Setting up the foundational Linux environment and boot firmware required to host our accelerated applications.
- Developing the Vector Addition Application: Navigating the Vitis 2025.2 unified environment to compile our host code and synthesise the hardware kernel.
- Transferring Generated Files to the KV260 Board: Seamlessly moving our compiled binaries, bitstreams, and hardware xclbins over to the target edge device.
- Running and Verifying the Application: Executing the code on-target to witness the hardware acceleration in action and validating the results.
Before diving in, ensure you have your Kria KV260 kit ready alongside a development machine running the Vitis 2025.2 suite.
Note on Required Expertise: This tutorial is designed for developers who already possess a foundational familiarity with Linux command-line operations and basic FPGA development concepts. If you know your way around a terminal and understand the core principles of hardware/software co-design, you are ready to begin.Setting Up Your Development Environment
To successfully compile hardware accelerators and build the software stack in AMD Vitis™ 2025.2, a robust Linux environment is required. Depending on your current hardware setup and personal workflow preferences, there are several viable paths you can take to follow along with this tutorial.
You can complete this guide using any of the following development environments:
- Native Ubuntu Linux Installation: Running Ubuntu directly on your primary workstation for maximum performance and direct access to hardware resources.
- Dual-Boot Configuration: A dedicated Ubuntu partition alongside your existing operating system, allowing you to switch environments upon booting.
- Virtual Machine (VM): Running Ubuntu within a hypervisor (such as VMware or VirtualBox) on top of a Windows or macOS host.
- Windows Subsystem for Linux (WSL2): A lightweight, highly integrated solution for running a native Ubuntu environment directly inside Windows 11 without the overhead of a traditional virtual machine.
For the purposes of this tutorial, I will be utilising WSL2 environment.
My exact demonstration environment consists of:
Microsoft Windows 11, Virtualisation Layer, Windows Subsystem for Linux (WSL2), Ubuntu 24.04 LTS
Important Note for WSL2 Users: If you are following along using WSL2, ensure that you have allocated sufficient system memory (RAM) and virtual disk space in your .wslconfig file, as FPGA synthesis and implementation are resource-intensive tasks. You can also run "WSL Settings" tool to do this.Before you can develop, deploy, or run any hardware-accelerated applications on the Kria KV260 board, the hardware must be booted into a fully compatible Linux environment.
To ensure seamless integration with our Vitis 2025.2 development tools, the recommended approach is to use the official AMD Embedded Distribution File System (EDF) image. This pre-built Linux distribution is specifically tailored and optimised for AMD adaptive SoCs (System on Chips).
1. Sourcing the SD Card Image
First, you need to grab the correct operating system image file directly from the official vendor repository.
- Navigate over to the official AMD Vitis Download page.
- Locate and click on the Embedded Software tab.
- Scroll down to find the Kria-specific options and download the following package: SD/Wic Image Kria Generic
Depending on the specific point release, the downloaded file will typically arrive on your machine as a compressed disk image file with a extension such as .wic.xz or .img.gz.
Tip: Do not manually extract or decompress the .wic.xz file unless your specific flashing software requires it. Modern flashing tools can read these compressed formats directly, saving you valuable storage space and time.
2. Flashing the microSD Card
With the image downloaded, the next step is to write it to your microSD card. Because this is a raw disk image (.wic.xz), you cannot simply copy and paste the file onto the card. You need to use a dedicated image-flashing utility to write the blocks directly to the storage sectors.
Step-by-Step Flashing Procedure
A reliable, cross-platform tool for this process is balenaEtcher, as it natively handles the compressed .wic.xz format without requiring you to unzip it first.
2.1.Connect the Media: Insert your microSD card into your host machine's card reader or an external USB adapter. Ensure any critical data on the card is backed up, as this process completely wipes the drive.
2.2.Select the OS Image: Launch balenaEtcher. Click on the Flash from file button, navigate to your downloads directory, and select the downloaded SD/Wic Image Kria Generic file.
2.3.Target the Drive: Click Select target. Carefully choose your microSD card from the list of available drives. Double-check the drive size to ensure you do not accidentally select an external backup hard drive.
2.4.Execute the Flash: Click the Flash! button. If prompted by Windows or macOS, grant administrative privileges to allow the software to write directly to the hardware sectors.
2.5.Verify and Eject: Allow the utility to finish both the Flashing phase and the automated Validating phase. Once the tool reports "Flash Complete!", it is safe to remove the microSD card from your computer.
A Note for Windows 11 Users: Immediately after the flashing process finishes, Windows may pop up several alerts saying "You need to format the disk in drive X: before you can use it". Ignore and close these warnings. Do not format the drive. Windows displays this message simply because it cannot natively read the Linux EXT4 partitions created on the card by the flashing tool.
3. Kria SOM Boot Firmware Update
Aside from having a properly flashed microSD card, your hardware deployment can hit an immediate roadblock if your Kria System-on-Module (SOM) is running outdated boot firmware.
The KV260 board features non-volatile QSPI flash memory integrated directly onto the SOM module itself. This memory houses the foundational factory-programmed boot firmware. Because AMD updates this low-level firmware to support new platform structures, compiler optimizations, and APIs, you must ensure your QSPI boot firmware version matches your Vitis 2025.2 runtime expectations. Running newer tool versions on legacy firmware often results in cryptic boot errors or XRT kernel launch failures.
Before initiating the update process, you must acquire the exact firmware file tailored to your ecosystem release:
- Head back to the official AMD Vitis Download page.
- Locate the Embedded Software section.
- Search for and download the specific boot container file:
- Filename:
k26-smk-sdt_kria boot.bin
AMD provides a built-in, web-based utility hidden within the Kria's primary hardware routine to flash this file effortlessly. You don't need dedicated JTAG programmers; all you need is an ethernet cable and an internet browser. Then follow the firmware recovery/update procedure described in the official AMD documentation:
Stage 2: Developing the Vector Addition ApplicationNow that our hardware target is prepared, we move on to the core development phase. In this stage, we will set up the Vitis development environment on our host machine, configure the essential cross-compilation assets, and lay the groundwork to build our Vector Addition (vadd) hardware accelerator.
This stage covers four key steps:
- Installing the required development software.
- Configuring the Linux target
sysroot(the system library environment). - Creating the Vitis unified application project.
- Compiling and building our hardware accelerator.
2.1 – Install Required Software
The backbone of our development setup is the AMD Vitis™ Unified Software Platform 2025.2. This environment merges hardware design tools with a standard software IDE, allowing us to manage both the FPGA fabric layout and our C/C++ host application under a single ecosystem.
To get started with the installation:
- Visit the official AMD Vitis Download Page.
- Download the Linux web installer or the full product installation package for version 2025.2.
- Run the installer inside your development environment (native Linux or your configured WSL2 environment).
Crucial Installation Settings: During the package selection step, you must ensure that you explicitly tick the boxes to enable Kria KV260 platform support and Embedded Development Tools. Leaving these unselected will omit the target device architectures and cross-compilers needed to target the Kria System-on-Module.
2.2 – Download and Configure the ZynqMP Common Image
Because the Kria board runs an active Linux operating system, we cannot compile our host application with a standard x86 compiler. We need a cross-compiler toolkit and an isolated Linux environment blueprint that matches our board. AMD provides this pre-configured via the ZynqMP Common Image package.
Follow these command-line steps in your terminal to unpack and configure the target filesystem environment:
Step 1: Obtain the Archive
Head back to the Vitis Embedded Platform tab on the download site and pull down the target bundle:
Package Name:xilinx-zynqmp-common-v2025.2_11160223.tar.gz
Step 2: Unpack and Install the Environment
Open a terminal in your workspace directory and execute the following sequence to extract the package and run the environment script:
Extract it:
tar -zxvf xilinx-zynqmp-common-v2025.2_11160223.tar.gzMove into the extracted directory:
cd xilinx-zynqmp-common-v2025.2/Run the SDK installer:
./sdk.sh -d .Running the ./sdk.sh -d . command extracts a self-contained cross-compilation tree right into the specified directory. This produces a critical folder called sysroots.
This directory contains the identical Linux header files (.h) and pre-compiled libraries (.so) present on the physical Kria board. When Vitis compiles your host application code on your PC, it references this folder to ensure the binary is perfectly tuned and ready to run on the Kria's ARM Cortex-A53 processor without architecture conflicts.
2.3 – Create the Vitis Application Workspace
With our foundational tools and sysroots ready, we can now open the Vitis platform and initialize our working layout. Vitis relies on a dedicated directory structure called a workspace to bundle your configuration maps, source repositories, and build trees.
Open your terminal window and execute the following commands to create your clean workspace directory and launch the application platform:
mkdir vadd_workspace
cd vadd_workspace
vitis -w .After a few moments, the modern, web-style layout of the Vitis Unified IDE will open up directly on your desktop.
2.4 – Import the Simple Vector Addition Example
Rather than forcing you to write hardware description frameworks or complex driver links from scratch, AMD packs fully optimized codebase models right into the Vitis installation.
To import our standard "Hello World" accelerator package:
1.Open the Examples Interface: Look at the central screen interface. Inside the active Welcome tab dashboard, click on the Examples tile.
2.Locate the Acceleration Template: In the comprehensive repository index that appears, browse down through the acceleration categories or use the search bar to locate the Simple Vector Addition (vadd) design entry.
3.Initialize the System Builder: Click on the template description card to display its details. Look over to the right side of the tab view and click the action link titled Create System Project from Template. This triggers the built-in system project setup wizard.
2.5 – Configure the Target Architecture Project
The system project setup wizard will guide you through mapping the template to your physical board and cross-compilation environment assets. Proceed through the wizard configurations using these exact parameters:
1. Name the System Project
On the initial configuration panel, leave the system layout identity set to its default option (vadd) or choose a personalized alternative, then click Next.
2. Select the Kria Target Hardware Platform
You will see an architectural list of target target cards. Look for and highlight the baseline package matching your evaluation kit layout:
- Platform Selection:
xilinx_kv260_base_202520_1
Once selected, proceed by clicking Next.
3. Mount your Cross-Compilation Sysroot Path
This is where we connect the Linux environment libraries we built earlier. On the configuration page:
- Click the Browse button situated next to the Sysroot parameter target field.
- Navigate your workspace directory to target the precise folder layout generated by your earlier
sdk.shenvironment run:xilinx-zynqmp-common-v2025.2/sysroots/cortexa72-cortexa53-amd-linux - Select that specific folder path, hit open, and click Next.
4. Finalize the System Target Generation
Review your asset mapping summary parameters, and click the Finish button icon.
Once you hit finish, Vitis initializes its background build engines. After a brief generation delay, your complete system design project tree will appear inside the primary IDE Explorer side-panel layout.
Take a few moments to click through the folders and open the source files to observe how the host software manages buffer streams while the hardware kernel executes parallelized vector arrays.
2.6 – Building the Application Stack
With our project fully configured, we have arrived at the most computationally intensive phase of our workflow: compiling the system. Because this is a heterogeneous system (meaning it contains both a standard CPU processor and programmable FPGA logic), the build system has to run two entirely different compiler pipelines in parallel, then stitch them together.
To execute the system build within the Vitis Unified IDE:
- Locate the FLOW navigator panel (typically found on the left or right edge of your screen interface).
- Find your primary system component entry labeled
vadd. - Under its action listings, click on the Build All command option.
A configuration prompt will slide down asking you to confirm your target components. Ensure that the checkboxes for both vadd_host (the ARM software binary) and vadd_vadd (the hardware kernel logic block) are actively ticked, then click OK.
What Happens Behind the Scenes?
When you trigger the compilation sequence, Vitis launches a multi-stage background pipeline that converts your abstract C++ source code into raw physical electronic configurations and machine binaries. The pipeline moves sequentially through the following heavy-duty engineering phases:
C/C++ Host Compilation: Compiles vadd.cpp (or host.cpp) into a native ARM-A53 Linux executableHLS Kernel Synthesis: High-Level Synthesis converts kernel C++ into RTL (VHDL/Verilog)Vivado Implementation: Performs physical Place & Route inside the FPGA silicon arrayBitstream Generation: Creates the raw configuration bit layout for the logic gatesXCLBIN Packaging: Merges the bitstream with metadata into an AMD extensible binary
Do not be alarmed if your development environment appears to "freeze" or your computer fans immediately spin up to maximum speed. Synthesizing hardware logic requires a massive amount of algorithmic computation.
The total build duration depends heavily on your host machine's processing power.
System Resource Tip: During the Vivado Implementation phase, the compiler may consume anywhere from 8 GB to 16 GB of system RAM simultaneously. Close any non-essential background tasks, web browsers, or heavy development environments on your host operating system while this process runs to prevent out-of-memory crash errors.
Once the build finishes successfully, Vitis will generate a collection of deployment files in your output directory, ready to be transferred directly to our physical Kria board.
Stage 3: Transferring Generated Files to the KV260 BoardWith the compilation successfully finished, your host workstation has generated the necessary deployment binaries. To execute this application on-target, we must transfer a specific set of four operational files onto our physical Kria KV260 board.
These components tell the target Linux kernel how to remap its physical hardware lines, load our custom logic gates into the programmable logic fabric, and execute our binary host control loop.
Sourcing the Four Required Files
Before starting the network transfer, create a temporary deployment directory on your development machine and gather the following files from your compilation tree:
1. The Programmable Logic Device Tree Overlay (pl.dtbo)
- Source Location:
<Vitis_Installation_Directory>/2025.2/Vitis/base_platforms/xilinx_kv260_base_202520_1/sw/boot/ - Purpose: This is a compiled Linux Device Tree Overlay file. Because an FPGA can completely change its internal peripheral interfaces at runtime, the operating system needs this file to dynamically understand the newly instantiated hardware buses, interrupt configurations, and memory channels without needing a full system reboot.
2. The FPGA Container Binary (vadd.xclbin)
- Source Location:
<vadd_workspace>/vadd/build/hw/hw_link/ - Purpose: This is our core AMD Extensible Platform binary file. It encapsulates the compiled hardware accelerator kernel logic, target routing constraints, and structural bitstream mappings generated by our Vivado and HLS compilation engines.
Note: you may need to rename the file into vadd.bin.
3. The Control Loop Software Executable (vadd_host)
- Source Location:
<vadd_workspace>/vadd_host/build/hw/ - Purpose: This file is a compiled native Linux ELF binary engineered specifically for the Kria board's 64-bit ARM Cortex-A53 processor cores. It controls data marshalling, assigns memory buffers, loads our
.xclbinlayout, and measures performance.
4. The Shell Configuration Metadata (shell.json)
- Source Location:User Created (Manually written configuration snippet)
- Purpose: The Kria firmware environment leverages a utility framework called the Xilinx Resource Manager (XRM) to safely provision hardware accelerations. This short metadata snippet explicitly directs the system to instantiate our design as a single, static acceleration block.
To create this file, open your terminal or your favourite command-line text editor (such as nano or vi) and generate a plain text file named exactly shell.json:
{
"shell_type" : "XRT_FLAT",
"num_slots": "1"
}Booting and Preparing the KV260 Board for Execution
With your deployment files safely generated, it is time to shift over to the physical hardware. This step covers booting your Kria KV260 board, setting up the required target system directories, and preparing our host binary for execution.
Step 1: Booting the Hardware & Establishing a Connection
To begin, ensure your board is completely powered off.
- Insert your freshly flashed microSD card into the slot located on the underside of the KV260 carrier card.
- Connect your network via an Ethernet cable, and plug a micro-USB cable from the board's micro-USB UART port to your host workstation.
- Plug in the 12V DC power adapter to boot the device.
You can interact with the live Linux operating system running on the board using two primary connection channels:
- Serial Console (UART)PuTTY, Tera Term, Minicom
- Network Shell (SSH) OpenSSH Terminal, PuTTY
Step 2: Creating the Firmware Directory Structure
The Kria platform's runtime firmware stack expects custom hardware overlays to reside in a specific, protected system directory. This allows the built-in system tools to find, parse, and load your custom bitstream components safely.
Open your serial terminal or SSH into your board (logging in with your standard target credentials), and execute the following commands to create the designated firmware target repository:
sudo mkdir -p /lib/firmware/xilinx/vaddStep 3: Positioning the Deployment Files
Now, you need to relocate the four files you prepared earlier into their final operational locations on the Kria board's filesystem. You can execute this transfer using interactive GUI tools like FileZilla (via SFTP), or standard terminal tools like the scp utility.
The dynamic device configuration files, hardware bitstreams, and metadata layouts must reside inside the newly created /lib/firmware/xilinx/vadd directory. Move these three specific components there: pl.dtbo,vadd.xclbin (vadd.bin), and shell.json.
Unlike the firmware overlays, the vadd_host executable does not need to live inside the root system firmware tree. You can store and run this file from any location you prefer within your user space (such as /home/petalinux/vadd_project/).
By default, files transferred over network protocols or copied out of raw compilation workspaces often lose their executable privileges. Before Linux will allow you to run the control loop software, you must manually alter its security permissions flags.
Navigate to the directory containing your vadd_host file and run the following command to make it universally executable:
chmod uog+x vadd_hostAt this stage, your environment is perfectly prepared, your hardware assets are in their correct system directories, and your host software is fully cleared for execution.
Stage 4: Running and Verifying the ApplicationWe have arrived at the final and most rewarding phase of our workflow: executing our code and witnessing hardware acceleration live on the Kria KV260.
To achieve this, we will first use the Kria platform utilities to dynamically program the FPGA fabric with our vector addition logic, and then run our software host application to control and verify the computation.
Step 1: Loading the Hardware Accelerator via xmutil
The AMD Kria platform uses a powerful command-line utility called xmutil (Xilinx Machine Utility). This tool allows you to safely query, unload, and dynamically hot-swap the hardware accelerators running inside the programmable logic without restarting your Linux environment.
Open your active terminal session on the Kria board and execute the following sequence of commands:
1. Inspect the available hardware applications on your system
sudo xmutil listapps2. Unload the default pre-loaded factory application to free up the FPGA slot
sudo xmutil unloadapp3. Dynamically load your newly created 'vadd' accelerator overlay
sudo xmutil loadapp vaddWhat's happening under the hood?
listapps: Scans the/lib/firmware/xilinx/directory we configured in the previous stage. You should see yourvaddentry listed as an available slot option.unloadapp: Clears out the current bitstream, resets the FPGA clock configurations, and prepares the fabric to receive a new system map.loadapp vadd: Instructs the Linux kernel to read yourshell.jsonconfiguration, register the hardware lines via yourpl.dtbodevice-tree overlay, and push your custom accelerator into the active FPGA fabric.
Step 2: Running the Host Executable
With the FPGA fully configured with our vadd accelerator hardware, we can now launch our host application. The host binary accepts the compiled acceleration binary package as a target command-line argument.
Navigate to the user space folder where you stored your vadd_host application file, and run the execution command:
./vadd_host -x vadd.binNote on Naming Conventions: Depending on your specific compilation and script options, your output hardware container might still be namedvadd.xclbinorvadd.bin. Ensure that the parameter you pass after the-xflag matches your file's exact filename on the board.
Verifying the Acceleration Results
Once launched, the host software will initialize data arrays in system RAM, map those buffers directly to the FPGA logic over the internal high-speed memory buses, trigger the accelerator execution, and compare the hardware output mathematically against a baseline CPU test loop.
If everything is configured correctly, your terminal will spit out a validation readout concluding with this milestone message:
Seeing TEST PASSED confirms that your multi-stage development, cross-compilation, firmware updating, and network deployment pipeline is fully operational. You have successfully taken a piece of high-level C++ code, transformed it into low-level physical routing gates on an FPGA, and executed it on an edge-AI development platform!











Comments