This Tutorial demonstrates on how to use the trace functionality in AMD Ryzen AI Phoenix using MLIR-AIE DIALECTS.
RequirementsAMD Ryzen AI Phoenix.Linux® based development environmentPython® (for test automation and result validation)IRON API and MLIR-based AI Engine Toolchain
Project BriefThe SOC is designed to accelerate the AIE-ML algorithms to deliver a good exceptional performance. NPU complex has
- 16 AI Cores for computation
- 4 Memory Tiles for fast memory access
- 4 SHIM DMA to Move data in and out of L3 MemoryNote : This project is customized for Phoenix
A basic SHIM DMA pass through sample use case with trace enabled for SHIM Tile, MEM Tile and Core Tiles.
Architecture
One SHIM DMA pass through
In this section SHIM DMA(0, 0), MEM Tile(0, 1) and Core(0, 2) of column 0 are used. A predefined set of data stored on L3 memory is streamed into the NPU complex. Data is routed from SHM DMA to Core via MEM Tile memory and it is routed back. Received output stream is captured and compared with the reference.
Trace is enabled for SHIM DMA(0, 0), MEM Tile(0, 1) and Core(0, 2) tile and stored in L3 memory which is later dumped.
How to Install and Setup the Environmentamd-ryzen-ai-npu-tool-chain-installation-and-execution
How to Build & Run & View TraceWe can now start building our close-to-metal IRON Python design. Run
make clean; make use_placed=1 traceThis command compiles the placed version of the design, generates the trace data file, and runs parse_trace.py to produce the trace_4b.json waveform file.
Invoking make with use_placed=1 is the standard way to build these versions of the design.
TraceTrace is a method for debugging and monitoring the execution and data flow within the complex, heterogeneous architecture of AMD AI NPUs.
The AMD AI NPUs architecture provides
- Hardware Trace Units: Dedicated hardware units within AI tiles. Mem Tile and Shim Tiles capture program execution flow and events.
- Trace Modes: Trace units can operate in different modes to record specific information, such as:
- Event-Time Mode: Tracks independent events per cycle.
- Event-PC Mode: Records the Program Counter (PC) value when a specific event occurs.
- Execution-Trace Mode: Sends minimal information to reconstruct the program's execution flow.
- Trace Output: Trace data is sent to external memory L3 (DDR).
- Software Support: The Vitis and perfetto provides tools for viewing and analyzing the trace compilation
Individual tiles can be selected for tracing# Set up a packet-switched flow from core to shim for tracing information
tiles_to_trace = [ShimTile,MemTile, ComputeTile]02.Trace routingTrace information in the form of trace packets are routed from the trace modules of individual tiles to SHIM tile and stored in external memory L3 (DDR).
trace_utils.configure_packet_tracing_flow(tiles_to_trace, ShimTile)03.Enabling the tracetrace_utils.configure_packet_tracing_flow(tiles_to_trace, ShimTile)04.Application execution and trace capturetrace_utils.configure_packet_tracing_flow(tiles_to_trace, ShimTile)05.Trace analysis- Opening the Trace
The trace waveform json file can be opened in http://ui.perfetto.dev.
Tile configured for trace are visible
Use ‘w’ to zoom into the trace and ‘s’ to zoom out
Core Events are visible in the execution time line. These events can be analyzed to debug and enhance the performance of the application.
This tutorial demonstrates how to use the “IRON API and MLIR-based AI Engine Toolchain” to select the Tile for trace, enable the trace and analyze the trace. In further projects we will detail how to use the trace to enhance the performance of the application.










Comments