This Session demonstrates how to use the Trace feature to efficiently implement the ‘negative’ of a Colour image in AMD Ryzen AI Phoenix using AIE DIALECTS and AIE API.
RequirementsAMD Ryzen AI Phoenix.Linux® based development environmentPython® (for test automation and result validation)IRON API and MLIR-based AI Engine ToolchainOpenCV (Open Source Computer Vision Library)
Project BriefThe SOC is designed to accelerate the AIE-ML algorithms to deliver a good, exceptional performance. The NPU complex has
- 16 AI Cores for computation
- 4 Memory Tiles for fast memory access
- 4 SHIM DMA to Move data in and out of L3 Memory
Note: This Session is customized for Phoenix.
Features covered in this sessionHow to optimize the “AMD Ryzen AI NPU Color Image to Negative Image Conversion”
Problem statementIn the 3 core design the cores were mostly in “Lock Stall” wasting the precious computational power.
Trace 3-Core Design3 core design
- Lock Stall duration 1286000ns
- Core Execution 156000 ns
- Total time to process one ROW 1442000 ns
After analysing the trace it was observed that
- The processing of R, G and B channels can be combined in one single core
- Ping-Pong buffers for input and output can be converted to single input buffer and single output buffer by reducing the overall memory footprint.
- Only one S2MM and MM2S channel combined with BD chain and locks to regulate the data flow and core execution
How to optimize the “AMD Ryzen AI NPU Color Image to Negative Image Conversion”
ArchitectureNegative of a colour Image
https://www.hackster.io/542861/amd-ryzen-ai-npu-color-image-to-negative-image-conversion-5afb77
Data Flow for Computation
- Lock Stall duration 320000 ns for Single Component
- Core Execution 190000 ns for Single Component
- Execution time for 3 rows or R, G, B - 1528000 ns which is closer to 1442000 ns - single row of 3 core design
The core then calculates the negative of each pixel and then routes the converted pixels back to L3 memory.
Input images
Output images
How to Install and Set Up the Environment?
https://www.hackster.io/541340/amd-ryzen-ai-npu-tool-chain-installation-and-execution-b252fa
ConclusionThis Session demonstrates how to use the Trace feature to analyze the implementation and optimize it.
Briefly describe the compassion between the two implementations:
* Only the parameter pertaining to AIE Cores are captured









Comments