This project is part of a series on the subject of deploying the MediaPipe models to the edge on embedded platforms.
If you have not already read part 1 of this series, I urge you to start here:
In this project, I start by giving a recap of the challenges that can be expected when deploying the MediaPipe models, specifically for the Qualcomm Dragonwing QCS6490.
Then I will address these challenges one by one, before deploying the models with the QAI Hub Workbench.
Finally, I will perform profiling to determine if our goal of acceleration was achieved.
Qualcomm AI Hub WorkbenchQualcomm provides an on-line suite allowing users to deploy models to their silicon devices.
The QAI Hub Workbench allows users to bring their own model (BYOM) and datasets (BYOD), in order to compile, quantize, and optimize for deployment on Qualcomm devices.
The quantization step is optional, and depends on the targeted device.
In this project, we will be specifically targeting the QCS6490 device’s NPU, so quantization will be required.
The QAI Hub Workbench supports the following model formats as input:
- PyTorch
- ONNX
Other frameworks are indirectly supported by exporting to the ONNX format. In our case, we will be converting the MediaPipe models from TFLite to ONNX, using the tf2onnx utility.
The deployment involves the following tasks, called jobs in QAI Hub Workbench:
- Quantize
- Compile
- Validate
- Profile
The Quantize job uses the AIMET framework under the hood to perform quantization of the model. In order to perform this quantization, a sub-set of the training dataset is required. The size of the required calibration data is typically in the order of several 1000s of samples.
The Compile job can target several runtime targets for deployment:
- TFLite runtime (QNN Delegate)
- ONNX runtime
- Qualcomm AI Run-time (QAIRT)
In this project, I explore and compare targeting to TFLite runtime (using QNN Delegate) and QAIRT.
My understanding is that deployment with ONNX runtime is only available on Windows, and not embedded Linux.
The Validate and Profile jobs can be used to perform inference and/or profile the model on the actual targeted device (in our case, QCS6490). This is performed in the cloud and occurs on actual devices. I have seen cases where these jobs timed-out due to un-available proxy devices.
In this project, I used version 2026.01.05.0 of the QAI Hub Workbench, which includes:
- AI Hub Workbench : aihub-2026.01.05.0
- QAIRT : 2.40.0.251030114326_189385
The Qualcomm flow and documentation have been changing frequently, making the user experience better at each improvement. The latest information on using the Qualcomm AI Hub Workbench can be found on Qualcomm's AI Hub:
Starting with QIRP 1.6 image on the Vision AI-KIT 6490This project was tested on the Vision AI-KIT 6490, using the QIRP 1.6 image. If you are targeting a different board, or using a different QIRP version, the instructions in the other sections may need to be changed.
Use the following Startup Guide to program the QIRP 1.6 image to the QCS6490 Vision AI Kit:
This will provided instructions on how to program the latest version of the QIRP 1.6 image (visionai_6490_qirp_1.6_v4.zip):
After booting the Vision AI-KIT 6490 with the QIRP 1.6 image, you can perform a sanity check with the Out of Box demo:
Notice the System Thermals and System Utilization graphs on the bottom. We will make use of these during our exploration.
Installing QAI Hub on the Vision AI-KIT 6490First, make certain that changes on the Vision AI-KIT 6490 will be persistent:
mount -o remount,rw /usrThe QAI Hub client can be installed with pip. We will also install the support for PyTorch and TFLite:
pip3 install qai-hub
pip3 install 'qai-hub[torch]'
pip3 install 'qai-hub[tflite]'I tried to install the “qai-hub[onnx]”” package but it failed to install, which seems to confirm my understanding that deployment with ONNX runtime is only available on Windows, and not embedded Linux.
We can also install the QAI Hub model zoo, as follows:
pip3 install qai_hub_modelsBefore using the QAI Hub client, you will need to setup an account:
With an account setup, you will find your API token here:
Then enter your credentials once using the following command:
qai-hub configure --api_token {API_TOKEN}As a sanity check, you can list the devices supported by the QAI Hub Workbench:
qai-hub list-devicesMediaPipe models on QAI HubIf you have read the Qualcomm documentation, you will have noticed that they already have MediaPipe models on their QAI Hub.
SO... WHY ARE WE RE-INVENTING THE WHEEL ?
This is not only an excellent question, but a very important point to highlight.
First, at the time I am writing this article, only 1 of the following mediapipe models were supported on the Qualcomm QCS6490:
- mediapipe_hand => NOT supported on QCS6490
- mediapipe_face => supported on QCS6490
- mediapipe_pose => NOT supported on QCS6490
Second, Qualcomm chose to support an older version of the mediapipe models (v0.07), instead of the most recent (v0.10).
This is VERY IMPORTANT to highlight, since major updates were made after the v0.7 version to the palm detection and hand landmark models, specifically for use with gesture and sign recognition:
Qualcomm, in fact, chose to support a version of the models that was converted to PyTorch by the open-source community:
- [Vidur Satija] BlazePalm : vidursatija/BlazePalm
- [Matthijs Hollemans] BlazeFace-PyTorch : hollance/BlazeFace-PyTorch
- [Zak Murez] MediaPipePyTorch : zmurez/MediaPipePytorch
Although zmurez does not divulge the conversion scripts that were used to generate the PyTorch versions of the models, vidrsatija and holland, whose work zmurez builds on, do provide the conversion scripts in the form of Jupyter notebooks.
Unfortunately, these conversions scripts/notebooks only work for the v0.7 version, and not the subsequent versions (believe me, I tried... ).
We can observe the reference to the zmurez/MediaPipePyTorch repository when we run the supported mediapipe_face models on our QCS6490 board:
root@qcs6490-visionai-kit:~# python -m qai_hub_models.models.mediapipe_face.demo
Note: This demo is running through torch, and not meant to be real-time without dedicated ML hardware.
Use Ctrl+C in your terminal to exit.
mediapipe_pytorch requires repository https://github.com/zmurez/MediaPipePyTorch. Ok to clone? [Y/n] Y
Cloning https://github.com/zmurez/MediaPipePyTorch to /root/.qaihm/models/mediapipe_pytorch/v1/zmurez_MediaPipePyTorch_git...
Done
...The choice of this outdated model does not make sense to me, other than perhaps at the time of the integration only PyTorch was supported by the Qualcomm AI Stack ?
I found myself in the same situation when I deployed these models to AMD/Xilinx Vitis-AI.
Regardless of the reason, I see an opportunity to take the support for MediaPipe one step further. Since we can convert the TFLite models to ONNX, I propose the following updated flow for the MediaPipe models on QCS6490:
The first challenge that I encountered, in part 1, was the reality that the performance of the MediaPipe models significantly degrades when run on embedded platforms, compared to modern computers. This is the reason I am attempting to accelerate the models with the QAI Hub Workbench.
The second challenge is the fact that Google does not provide the dataset that was used to train the MediaPipe models. Since quantization requires a subset of this training data, this presents us with the challenge of coming up with this data ourselves.
In order to tackled these challenges, we will clone the following repository (blaze_tutorial), which will be used to quantize, compile, and profile the models in the clould with QAI Hub Workbench:
git clone --branch qcs6490 https://github.com/AlbertaBeef/blaze_tutorialCreating a Calibration Dataset for QuantizationAs described previously in the "QAI Hub Workbench Overview" section, the quantization phase requires several hundreds to thousands of data samples, ideally a subset from the training data. Since we do not have access to the training dataset, we need to come up with this data ourselves.
We can generate the calibration dataset using a modified version of the blaze_app_python.py script, as follows:
For each input image that contains at least one hand, we want to generate:
- palm detection input images : resized image and padded to model's input size
- hand landmarks input images : cropped image of each hand, resized to model's input size
Two possible sources for input images are the following:
- Kaggle : many datasets exist, and may be reused
- Pixabay : contains several interesting videos, from which images can be extracted
For the case of Kaggle, if we take an existing dataset such as the following:
We can create a modified version of the blaze_detect_live.py script (from the blaze_app_python repository) that will scan all the images and generate a NumPy-specific binary format (*.npy) file containing our calibration data for the quantization step:
To run this script, navigate to the "blaze_app_python/calib_dataset_kaggle" directory, download the kaggle dataset to this sub-directory, and launch the script as follows:
$ python3 gen_calib_hand_dataset.py
[INFO] 2167 images found in kaggle_hand_gestures_dataset
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[INFO] calib_palm_detection_192_dataset shape = (1871, 192, 192, 3) uint8 0 255
[INFO] calib_palm_detection_256_dataset shape = (1871, 256, 256, 3) uint8 0 255
[INFO] calib_hand_landmark_224_dataset shape = (1880, 224, 224, 3) uint8 0 255
[INFO] calib_hand_landmark_256_dataset shape = (1880, 256, 256, 3) uint8 0 255This will create the following calibration data for the 0.10 versions of the palm detection and hand landmarks models:
- calib_palm_detection_192_dataset.npy : 1871 samples of 192x192 RGB images
- calib_hand_landmark_224_dataset.npy : 1880 samples of 224x224 RGB images
I ultimately decided to not use this dataset, but documented the process for reference, which can be applied to any other Kaggle dataset.
If we take the case of PixaBay, we can use several videos as source such as the following:
- https://pixabay.com/videos/pixabay-sign-language-people-inclusion-58301/
- https://pixabay.com/videos/pixabay-sign-language-people-inclusion-58302/
- https://pixabay.com/videos/pixabay-man-living-room-faces-expression-13625/
- https://pixabay.com/videos/pixabay-man-face-expression-irritated-182353.mp4
- https://pixabay.com/videos/pixabay-hands-good-accept-vote-ok-gesture-168344/
- https://pixabay.com/videos/pixabay-girl-heart-gesture-symbol-cute-129421/
Once again, we can create a modified version of the blaze_detect_live.py script (from the blaze_app_python repository) that will scan through the videos and generate a NumPy-specific binary format (*.npy) file containing our calibration data for the quantization step:
To run this script, navigate to the "blaze_app_python/calib_dataset_pixabay" directory, download the PixaBay videos in a "videos" sub-sub-directory, and launch the script as follows:
$ python3 gen_calib_hand_dataset.py
[INFO] Start of video ./videos/pixabay-sign-language-people-inclusion-58301.mp4
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[INFO] End of video ./videos/pixabay-sign-language-people-inclusion-58301.mp4
[INFO] Collected 228 images for calib_palm_detection_192_dataset
[INFO] Collected 228 images for calib_palm_detection_256_dataset
[INFO] Collected 336 images for calib_hand_landmark_224_dataset
[INFO] Collected 336 images for calib_hand_landmark_256_dataset
[INFO] Start of video ./videos/pixabay-sign-language-people-inclusion-58302.mp4
[INFO] End of video ./videos/pixabay-sign-language-people-inclusion-58302.mp4
[INFO] Collected 694 images for calib_palm_detection_192_dataset
[INFO] Collected 694 images for calib_palm_detection_256_dataset
[INFO] Collected 1127 images for calib_hand_landmark_224_dataset
[INFO] Collected 1127 images for calib_hand_landmark_256_dataset
[INFO] Start of video ./videos/pixabay-man-living-room-faces-expression-136253.mp4
[INFO] End of video ./videos/pixabay-man-living-room-faces-expression-136253.mp4
[INFO] Collected 1041 images for calib_palm_detection_192_dataset
[INFO] Collected 1041 images for calib_palm_detection_256_dataset
[INFO] Collected 1818 images for calib_hand_landmark_224_dataset
[INFO] Collected 1818 images for calib_hand_landmark_256_dataset
[INFO] Start of video ./videos/pixabay-man-face-expression-irritated-182353.mp4
[INFO] End of video ./videos/pixabay-man-face-expression-irritated-182353.mp4
[INFO] Collected 1138 images for calib_palm_detection_192_dataset
[INFO] Collected 1138 images for calib_palm_detection_256_dataset
[INFO] Collected 1933 images for calib_hand_landmark_224_dataset
[INFO] Collected 1933 images for calib_hand_landmark_256_dataset
[INFO] Start of video ./videos/pixabay-hands-good-accept-vote-ok-gesture-168344.mp4
[INFO] End of video ./videos/pixabay-hands-good-accept-vote-ok-gesture-168344.mp4
[INFO] Collected 1361 images for calib_palm_detection_192_dataset
[INFO] Collected 1361 images for calib_palm_detection_256_dataset
[INFO] Collected 2379 images for calib_hand_landmark_224_dataset
[INFO] Collected 2379 images for calib_hand_landmark_256_dataset
[INFO] Start of video ./videos/pixabay-girl-heart-gesture-symbol-asian-129421.mp4
[INFO] End of video ./videos/pixabay-girl-heart-gesture-symbol-asian-129421.mp4
[INFO] Collected 1577 images for calib_palm_detection_192_dataset
[INFO] Collected 1577 images for calib_palm_detection_256_dataset
[INFO] Collected 2595 images for calib_hand_landmark_224_dataset
[INFO] Collected 2595 images for calib_hand_landmark_256_dataset
[INFO] calib_palm_detection_192_dataset shape = (1577, 192, 192, 3) uint8 0 255
[INFO] calib_palm_detection_256_dataset shape = (1577, 256, 256, 3) uint8 0 255
[INFO] calib_hand_landmark_224_dataset shape = (2595, 224, 224, 3) uint8 0 255
[INFO] calib_hand_landmark_256_dataset shape = (2595, 256, 256, 3) uint8 0 255This will create the following calibration data for the 0.10 versions of the palm detection and hand landmarks models:
- calib_palm_detection_192_dataset.npy : 1577 samples of 192x192 RGB images
- calib_hand_landmark_224_dataset.npy : 2595 samples of 224x224 RGB images
You are free to use either source described above, or use your own source as data for the quantization phase.
I have archived my exploration on this sub-topic (creating hand/face/pose datasets for various versions of models) in the following two archives:
- Kaggle : calib_dataset_kaggle.zip
- Pixabay : calib_dataset_pixabay.zip
For the purpose of this exploration, I have prepared the calibration data (from Pixabay), which can be downloaded and extracted as follows:
cd blaze_tutorial/qcs6490
wget https://github.com/AlbertaBeef/blaze_tutorial/releases/download/qcs6490_version_1/blaze_calibration_data.zip
unzip blaze_calibration_data.zipModel ConversionThe second step to prepare for deployment with QAI Hub Workbench, is to download the TFLite models, and convert them to ONNX with the tf2onnx utility:
- get_tflite_models.sh : download TFLite models from Google
- convert_models.sh : convert models ONNX format with tf2onnx utility
Use the following commands to download and convert the mediapipe models to ONNX:
cd blaze_tutorial/qcs6490
cd models
source ./get_tflite_models.sh
pip3 install tf2onnx
source ./convert_models.shModel DeploymentNow that we have our calibration data, and our models converted to ONNX, we can perform the model quantization, profiling, and compilation using AI Hub Workbench.
I have prepared a script for this purpose:
This script takes three (3) arguments when invoked:
- name : model name (ie. palm_detection_lite)
- model : model file (ie. models/palm_detection_lite.onnx)
- resolution : input size (ie. 256)
The name argument indicates which model we are deploying, such as palm_detection_lite or palm_detection_full for the palm detector, or hand_landmark_lite or hand_landmark_full for the hand landmark models. The resolution indicates the input size to the model.
These two arguments will determine which calibration dataset to use for the quantization. For example:
- name=palm_detection_lite, size=192 => calib_palm_detection_192_dataset.npy
- name=hand_landmark_lite, size=224 => calib_hand_landmark_224_dataset.npy
The script will generate output artifacts for the following target runtimes:
- tflite (*.tflite)
- onnx (*.onnx.zip)
- qnn_dlc (*.dlc)
- qnn_context_binary (*.bin)
- precompiled_qnn_onnx (*.onnx.zip)
I will only test inference with the following two target runtimes:
- tflite (*.tflite) => Using TFLite (with QNN Delegate)
- qnn_context_binary (*.bin) => using QAIRT
I have provided a second script which will call the qai_hub_workbench_flow.py script to quantize, compile, and profile the models:
You will want to modify the following list before execution:
- model_list : specify which model(s) you want to deploy
Below is a modified version of the script that will deploy the 0.10 versions of the palm detection and hand landmarks models.
# ONNX models
model_palm_detector_v0_07=("palm_detection_v0_07","models/palm_detection_v0_07.onnx",256)
model_hand_landmark_v0_07=("hand_landmark_v0_07","models/hand_landmark_v0_07.onnx",256)
model_palm_detector_v0_10_lite=("palm_detection_lite","models/palm_detection_lite.onnx",192)
model_palm_detector_v0_10_full=("palm_detection_full","models/palm_detection_full.onnx",192)
model_hand_landmark_v0_10_lite=("hand_landmark_lite","models/hand_landmark_lite.onnx",224)
model_hand_landmark_v0_10_full=("hand_landmark_full","models/hand_landmark_full.onnx",224)
model_face_detector_v0_10_short=("face_detection_short_range","models/face_detection_short_range.onnx",128)
model_face_detector_v0_10_full=("face_detection_full_range","models/face_detection_full_range.onnx",192)
model_face_landmark_v0_10=("face_landmark","models/face_landmark.onnx",192)
model_pose_detector_v0_10=("pose_detection","models/pose_detection.onnx",224)
model_pose_landmark_v0_10_lite=("pose_landmark_lite","models/pose_landmark_lite.onnx",256)
model_pose_landmark_v0_10_full=("pose_landmark_full","models/pose_landmark_full.onnx",256)
model_pose_landmark_v0_10_heavy=("pose_landmark_heavy","models/pose_landmark_heavy.onnx",256)
model_list=(
model_palm_detector_v0_10_full[@]
model_hand_landmark_v0_10_full[@]
model_palm_detector_v0_10_lite[@]
model_hand_landmark_v0_10_lite[@]
)
model_count=${#model_list[@]}
#echo $model_count
for ((i=0; i<$model_count; i++))
do
model=${!model_list[i]}
model_array=(${model//,/ })
model_name=${model_array[0]}
model_file=${model_array[1]}
input_resolution=${model_array[2]}
echo python3 qai_hub_workbench_flow.py --name ${model_name} --model ${model_file} --resolution ${input_resolution}
python3 qai_hub_workbench_flow.py --name ${model_name} --model ${model_file} --resolution ${input_resolution} | tee deploy_${model_name}.log
doneThis script can be executed as folllows:
cd blaze_tutorial/qcs6490
source ./deploy_models_qai_hub_workbench.shWhen complete, the following compiled models will be located in the current directory:
- palm_detection_full.tflite, palm_detection_full.bin,...
- hand_landmarks_full.tflite, hand_landmarks_full.bin,...
- palm_detection_lite.tflite,, palm_detection_lite.bin,...
- hand_landmarks_lite.tflite, hand_landmarks_lite.bin, …
For convenience, I have archived the compiled models for QCS6490 in the following archives:
- TFLite models (*.tflite) : blaze_tflite_qnn_models_qcs6490.zip
- QAIRT models (*.bin) : blaze_qairt_models_qcs6490.zip
All of the results will be located in your QAI Hub Workbench account on-line:
On the "Jobs" page, if we click on the "Profile" tab, we can see our profiling results for each of the models:
If we compare the unquantized ONNX model running on CPU with the quantized models running on NPU:
- palm_detection_full (QAIRT version) : 66.4 msec => 1.3 msec
- hand_landmarks_full (QAIRT version) : 47.4 msec => 1 msec
If we compare the unquantized ONNX model running on CPU with the quantized models running on NPU:
- palm_detection_lite (QAIRT version) : 54.7 msec => 1.2 msec
- hand_landmarks_lite (QAIRT version) : 29.9 msec => 0.7 msec
This is significant acceleration (between 30X and 60X) !
There is one anomaly in the profile results for the quantized models:
- palm_detection_full (TFlite version) : 66.4 msec => 31.0 msec
If we click on the job to understand what is happening, we can see that the model is being only partially accelerated on the NPU, with 150 layers still being executed on the CPU.
All of the other jobs have a clean NPU implementation, including the QAIRT version of the palm_detection_full model:
If we scroll down and select the "Runtime Layer Analysis" section, we can click on the "VIEW OPTRACE" button to get a detailed layer by layer profile:
If we look at the number of layers (CPU, NPU,, GPU) for each model, we get the following reported layers in the profile jobs:
The "ONNX Layers" corresponds to the floating-point ONNX model that we used as input for the QAI Hub Workbench.
The "TFLite Layers" corresponds to the quantized model, targetted to the TFLite runtime.
The "QAIRT Layers" corresponds to the quantized model, targetted to "qnn_compiled_binary", which can be used with the Qualcomm AI runtime.
Model AccuracyIf we look at model accuracy, we get the following results in the quantize jobs:
We can see that the model accuracy still needs work. This may be related to the calibration data, but I have not investigated further.
The best accuracy is being achieved with version 0.07 of the palm detection model.
The worst accuracy is being achieved with version 0.07 of the hand landmarks model. This can be seen with version 0.07 of the pipeline, where the hands are detected correctly, but the landmarks are not very accurate.
Unfortunately, the PSNR calculation for the v0.10 version of the hand landmark models failed, so I do not know what the metrics are, but they definitely look accurate, so I'll assume they are above 30 dB.
Except for the handedness, which is wrong (always reporting "left" hands).
Model ExecutionIn order to support the QCS6490 models, the "blaze_app_python" application was augmented with the following inference targets:
As can be seen, I already have support for the original TFLite models, as well as the PyTorch versions (v0.07) of the models.
We can also execute the unquantized ONNX models, but most importantly, support for execution on the NPU was added using the following two runtime targets:
- TFLite (using QNN Delegate)
- QAIRT
My final inference code for the TFLite (with QNN Delegate), actually makes use of the EdgeAI Lite-RT (latest version of TFLite from Google), and can be found in the "blaze_app_python" repository, under the blaze_tflite_qnn sub-directory:
- blaze_app_python/blaze_tflite_qnn/blazedetector.py
- blaze_app_python/blaze_tflite_qnn/blazelandmark.py
We need to ensure that the required library is present on our board, which it is:
root@qcs6490-visionai-kit:~# ls /usr/lib/libQnnTFLiteDelegate.so
/usr/lib/libQnnTFLiteDelegate.soMy final inference code for QAIRT can be found in the "blaze_app_python" repository, under the blaze_qairt sub-directory:
We need to install the QAIRT SDK on our board, which can be done using the following instructions.
First, we download and install version 2.40 of the QAIRT SDK:
export PRODUCT_SOC=6490 DSP_ARCH=68
wget https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.40.0.251030/v2.40.0.251030.zip
unzip v2.40.0.251030.zip
cd qairt/2.40.0.251030
source bin/envsetup.sh
export ADSP_LIBRARY_PATH=$QNN_SDK_ROOT/lib/hexagon-v${DSP_ARCH}/unsigned
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$QNN_SDK_ROOT/lib/aarch64-oe-linux-gcc11.2Then, we [optionally] clone, build, and install the QAI App Builder:
git clone --branch v2.40.0 https://github.com/quic/ai-engine-direct-helper--recursive
cd ai-engine-direct-helper
python3 setup.py bdist_wheel
pip3 install dist/qai_appbuilder-2.38.0-cp312-cp312-linux_aarch64.whlInstalling the python application on the Vision AI-KIT 6490First, let's ensure that our changes are persistent:
mount -o remount,rw /usrThe python demo application requires certain packages which can be installed as follows:
pip3 install ai-edge-litertThe python application can be accessed from the following github repository:
git clone --branch blaze_qnn https://github.com/AlbertaBeef/blaze_app_python
cd blaze_app_pythonIn order to successfully use the python demo with the original TFLite models, they need to be downloaded from the google web site:
cd blaze_tflite/models
source ./get_tflite_models.sh
cd ../..In order to successfully use the python demo with the QCS6490 models, they need to be downloaded as follows:
cd blaze_tflite_qnn/models
source ./get_qcs6490_models.sh
unzip blaze_tflite_qnn_models_qcs6490.zip
cd ../..
cd blaze_qairt/models
source ./get_qcs6490_models.sh
unzip blaze_qairt_models_qcs6490.zip
cd ../..You are all set !
Launching the python application on the Vision AI-KIT 6490As we already saw in part 1, the python application can launch many variations of the dual-inference pipeline, which can be filtered with the following arguments:
- --blaze : hand | face | pose
- --target : blaze_tflite |... | blaze_tflite_qnn | blaze_qairt
- --pipeline : specific name of pipeline (can be queried with --list argument)
In order to display the complete list of supported pipelines, launch the python script as follows:
root@zub1cg-sbc-2022-2:~/blaze_app_python# python3 blaze_detect_live.py --list
[INFO] user@hosthame : root@zub1cg-sbc-2022-2
[INFO] blaze_tflite supported ...
...
[INFO] blaze_qairt supported ...
...
Command line options:
--input :
--image : False
--blaze : hand,face,pose
--target : blaze_tflite,...,blaze_qairt
--pipeline : all
--list : True
--debug : False
--withoutview : False
--profilelog : False
--profileview : False
--fps : False
List of target pipelines:
...
## qairt_hand_v0_10_lite blaze_qairt/models/palm_detection_lite.bin
blaze_qairt/models/hand_landmark_lite.bin
## qairt_hand_v0_10_full blaze_qairt/models/palm_detection_full.bin
blaze_qairt/models/hand_landmark_full.bin
...In order to launch the v0.10 lite version of the pipeline for hand detection and landmarks, and the TFLite runtime with QNN delegate, use the python script as follows:
python3 blaze_detect_live.py --pipeline=tflqnn_hand_v0_10_lite --fpsThis will launch the 0.10 (lite) version of the models, compiled for QCS6490, as shown below:
The previous video has not been accelerated. It shows the frame rate to be approximately 30 fps when no hands are detected (one model running : palm detection), approximately 20 fps when one hand has been detected (two models running : palm detection and hand landmarks), and approximately 15 fps when two hands have been detected (three models running : palm detection and 2 hand landmarks).
This is slightly worse than the original TFLite models running on the CPU, so I have to consider the use of TFLite with QNN delegate unsuccessful for this use case.
In order to launch the v0.10 lite version of the pipeline for hand detection and landmarks, and the Qualcomm AI runtime, use the python script as follows:
python3 blaze_detect_live.py --pipeline=qairt_hand_v0_10_lite --fpsThis will launch the 0.10 (lite) version of the models, compiled for QCS6490, as shown below:
The previous video has not been accelerated. It shows the frame rate to be 30 fps when no hands are detected (one model running : palm detection), as well as when one hand has been detected (two models running : palm detection and hand landmarks), and when two hands have been detected (three models running : palm detection and 2 hand landmarks).
Contrary to the TFLite runtime with QNN delegate, the Qualcomm AI runtime is achieving significant acceleration !
I have created a montage of the TFLite reference, the TFLite with QNN delegate, and QAIRT, for comparison:
In order to know the true performance of the models running with QAIRT, we will need to detach from the USB camera (which is determining the frame rate of 30fps). We will be doing this in the next section.
Benchmarking the models on the Vision AI-KIT 6490In order to obtain stable profile results, we used a test image (with two hands) that can be downloaded from Google as follows:
source ./get_test_images.shWe can visualize the profiling results between the original TFLite models and the QAIRT accelerated models using the following command:
python3 blaze_detect_live.py --testimage --pipeline=tfl_hand_v0_10_full,tfl_hand_v0_10_lite,qairt_hand_v0_10_full,qairt_hand_v0_10_lite, --profileviewThe following graphs will appear:
You may have noticed that the bars in the graphs have some jitter, so we will capture a series of values to CSV file, then average the results, to get a better idea of the performance.
I do not have this automated, so it is a manual process of capture, process, visualize...
The following commands can be used to generate profile results for the qairt_hand_v0_10_lite pipeline using the QCS6490 models, and the test image:
rm blaze_detect_live.csv
python3 blaze_detect_live.py--testimage --withoutview --profilelog --pipeline=qairt_hand_v0_10_lite
mv blaze_detect_live.csv blaze_detect_live_qcs6490_qairt_hai_hand_v0_10_lite.csvThe following commands can be used to generate profile results for the tfl_hand_v0_10_lite pipeline using the TFLite models, and the test image:
rm blaze_detect_live.csv
python3 blaze_detect_live.py --testimage --withoutview --profilelog --pipeline=tfl_hand_v0_10_lite
mv blaze_detect_live.csv blaze_detect_live_qcs6490_tfl_hand_v0_10_lite.csvThe same is done for the qairt_hand_v0_10_full & tfl_hand_v0_10_full models.
The results of all.csv files were averaged, then plotted using Excel.
Here are the profiling results for the models deployed with QAIRT, in comparison to the reference TFLite models:
Again, it is worth noting that these benchmarks have been taken with a single-threaded python script. There is additional opportunity for acceleration with a multi-threaded implementation. While the graph runner is waiting for transfers from one model's sub-graphs, another (or several other) model(s) could be launched in parallel...
There is also an opportunity to accelerate the rest of the pipeline with C++ code...
Known IssuesAlthough I have quantized and deployed the v0.07 version of the palm_detection and hand_landmarks models, the accuracy of the hand_landmarks model has degraded, so do not re-use this one in your application.
For the v0.10 lite version of the hand landmarks, the handedness does not seem to have been correctly handled. The models are always returning ~1.0, corresponding to "left" hand.
In summary, only the v0.10 full version of the palm detection and hand landmark models are fully functional, so use those.
ConclusionAlthough the on-line QAI Hub Workbench works well, I doubt serious customers with real-projects will accept to upload their proprietary models and datasets to the Qualcomm cloud.
Are you using Qualcomm Dragonwing devices in your projects ?
Let me know if the comments...
AcknowledgementsI need to acknowledge the extraordinary work of my colleagues, which made the project possible:
- Monica Houston : for her work on the quantized TFLite implementation, which I was able to quickly adapt for the TFLite with QNN Delegate implementation
- Maxim Saka : for the immense work involved in creating the QIRP images, and creating the GTK user interface for the OOB demo, which was very useful during this investigation
- 2026/01/19 - Initial Version
Accelerating MediaPipe (by Mario Bergeron):
- Hackster Series Part 1 : Blazing Fast Models
- Hackster Series Part 2 : Insightful Datasets for ASL recognition
- Hackster Series Part 3 : Accelerating the MediaPipe models with Vitis-AI 3.5
- Hackster Series Part 4 : Accelerating the MediaPipe models with Hailo-8
- Hackster Series Part 5 : Accelerating the MediaPipe models on RPI5 AI Kit
- Hackster Series Part 6 : Accelerating the MediaPipe models with MemryX
- Hackster Series Part 7 : Accelerating the MediaPipe models with Qualcomm
- Hackster Series Part 8 : Accelerating the MediaPipe models with AzurEngine
- Blaze Utility (python version) : blaze_app_python
- Blaze Utility (C++ version) : blaze_app_cpp
Qualcomm References:
- QAI Hub : https://aihub.qualcomm.com/get-started
- QAI Hub Workbench : https://aihub.qualcomm.com/get-started#workbench
- Sample python inference application using LiteRT QNN Delegate : https://github.com/quic/sample-apps-for-qualcomm-linux/blob/main/qualcomm-linux/applications/LiteRT/object_detection.py
- Sample python inference application using AI Engine Direct : https://github.com/quic/ai-engine-direct-helper/tree/main/samples/python/mediapipe_hand
MediaPipe resources
- [Google] MediaPipe Solutions Guide : https://ai.google.dev/edge/mediapipe/solutions/guide
- [Google] MediaPipe Source Code : https://github.com/google-ai-edge/mediapipe
- [Google] SignALL SDK : https://developers.googleblog.com/en/signall-sdk-sign-language-interface-using-mediapipe-is-now-available-for-developers/
Open-Source Porting Effort to PyTorch
- [Vidur Satija] BlazePalm : vidursatija/BlazePalm
- [Matthijs Hollemans] BlazeFace-PyTorch : hollance/BlazeFace-PyTorch
- [Zak Murez] MediaPipePyTorch : zmurez/MediaPipePytorch





Comments