This tutorial demonstrates a data pass through example to explain the data flow in AMD Ryzen AI Phoenix using AIE DIALECTS.
RequirementsAMD Ryzen AI Phoenix.Linux® based development environmentPython® (for test automation and result validation)IRON API and MLIR-based AI Engine Toolchain
Project BriefThe SOC is designed to accelerate the AIE-ML algorithms to deliver a good exceptional performance. NPU complex has
- 16 AI Cores for computation
- 4 Memory Tiles for fast memory access
- 4 SHIM DMA to MoveNote : This project is customized for Phoenix.
- One SHIM DMA pass through
- Two SHIM DMA pass through
- Four SHIM DMA pass through
In this section SHIM DMA(0, 0), MEM Tile(0, 1) and Core(0, 2) of column 0 are used. A predefined set of data stored on L3 memory is streamed into the NPU complex. Data is routed from SHM DMA to Core via MEM Tile memory and it is routed back. Received output stream is captured and compared with the reference.
Two SHIM DMA pass throughIn this section SHIM DMA((0, 0), (1, 0)), MEM Tile((0, 1), (1, 1)) and Core((0, 2), (1, 2)) of column 0 and column 1 are used. A predefined set of data stored on L3 memory is streamed into the NPU complex. Data is routed from SHM DMA to Core via MEM Tile memory and it is routed back. Received output stream is captured and compared with the reference.
Four SHIM DMA pass throughIn this section SHIM DMA((0, 0), (1, 0), (2, 0)(3, 0)), MEM Tile((0, 1), (1, 1), (2, 1), (3, 1)) and Core((0, 2), (1, 2), (2, 2), (3, 2)) of column 0, 1, 2, 3 are used. A predefined set of data stored on L3 memory is streamed into the NPU complex. Data is routed from SHM DMA to Core via MEM Tile memory and it is routed back. Received output stream is captured and compared with the reference.
Data Flowhttps://www.hackster.io/541340/amd-ryzen-ai-npu-tool-chain-installation-and-execution-b252fa
Build and RunNavigating to one of the testcase folder and build the AIE design:Build the design using the make command on the test case path:
env use_placed=1 makeAfter completing a successful build, the host application was compiled and executed to run the design on the Ryzen AI NPU.
make run NPU1=1This triggered the MLIR-AIE runtime, which offloaded computations to the NPU. We could see the accelerated results immediately.
ConclusionWith these tutorials we are able to demonstrate how to use the “IRON API and MLIR-based AI Engine Toolchain” and perform the data pass through.Further these tutorials will be extended to characterize the data throughput with single and many SHIM DMAs operating in parallel.










Comments