Published January 19, 2026 © Apache-2.0

Accelerating the MediaPipe models with Qualcomm

An exploration of accelerating the MediaPipe models with Qualcomm

AdvancedFull instructions provided4 hours809

Things used in this project

Hardware components

Tria Technologies Vision AI Kit 6490

Story

Introduction

This project is part of a series on the subject of deploying the MediaPipe models to the edge on embedded platforms.

If you have not already read part 1 of this series, I urge you to start here:

Blazing Fast Models

In this project, I start by giving a recap of the challenges that can be expected when deploying the MediaPipe models, specifically for the Qualcomm Dragonwing QCS6490.

Then I will address these challenges one by one, before deploying the models with the QAI Hub Workbench.

Finally, I will perform profiling to determine if our goal of acceleration was achieved.

Qualcomm AI Hub Workbench

Qualcomm provides an on-line suite allowing users to deploy models to their silicon devices.

The QAI Hub Workbench allows users to bring their own model (BYOM) and datasets (BYOD), in order to compile, quantize, and optimize for deployment on Qualcomm devices.

Qualcomm AI Hub Workbench (📷: Qualcomm)

The quantization step is optional, and depends on the targeted device.

In this project, we will be specifically targeting the QCS6490 device’s NPU, so quantization will be required.

The QAI Hub Workbench supports the following model formats as input:

PyTorch
ONNX

Other frameworks are indirectly supported by exporting to the ONNX format. In our case, we will be converting the MediaPipe models from TFLite to ONNX, using the tf2onnx utility.

The deployment involves the following tasks, called jobs in QAI Hub Workbench:

Quantize
Compile
Validate
Profile

The Quantize job uses the AIMET framework under the hood to perform quantization of the model. In order to perform this quantization, a sub-set of the training dataset is required. The size of the required calibration data is typically in the order of several 1000s of samples.

AI Model Efficiency Toolkit (AIMET) (📷: Qualcomm)

The Compile job can target several runtime targets for deployment:

TFLite runtime (QNN Delegate)
ONNX runtime
Qualcomm AI Run-time (QAIRT)

In this project, I explore and compare targeting to TFLite runtime (using QNN Delegate) and QAIRT.

My understanding is that deployment with ONNX runtime is only available on Windows, and not embedded Linux.

The Validate and Profile jobs can be used to perform inference and/or profile the model on the actual targeted device (in our case, QCS6490). This is performed in the cloud and occurs on actual devices. I have seen cases where these jobs timed-out due to un-available proxy devices.

In this project, I used version 2026.01.05.0 of the QAI Hub Workbench, which includes:

AI Hub Workbench : aihub-2026.01.05.0
QAIRT : 2.40.0.251030114326_189385

The Qualcomm flow and documentation have been changing frequently, making the user experience better at each improvement. The latest information on using the Qualcomm AI Hub Workbench can be found on Qualcomm's AI Hub:

https://aihub.qualcomm.com/get-started

Starting with QIRP 1.6 image on the Vision AI-KIT 6490

This project was tested on the Vision AI-KIT 6490, using the QIRP 1.6 image. If you are targeting a different board, or using a different QIRP version, the instructions in the other sections may need to be changed.

Use the following Startup Guide to program the QIRP 1.6 image to the QCS6490 Vision AI Kit:

Vision AI-KIT 6490 Startup Guide v1.4

This will provided instructions on how to program the latest version of the QIRP 1.6 image (visionai_6490_qirp_1.6_v4.zip):

visionai_6490_qirp_1.6_v4.zip

After booting the Vision AI-KIT 6490 with the QIRP 1.6 image, you can perform a sanity check with the Out of Box demo:

Out of Box Demo (📷: AlbertaBeef)

Notice the System Thermals and System Utilization graphs on the bottom. We will make use of these during our exploration.

Installing QAI Hub on the Vision AI-KIT 6490

First, make certain that changes on the Vision AI-KIT 6490 will be persistent:

mount -o remount,rw /usr

The QAI Hub client can be installed with pip. We will also install the support for PyTorch and TFLite:

pip3 install qai-hub
pip3 install 'qai-hub[torch]'
pip3 install 'qai-hub[tflite]'

I tried to install the “qai-hub[onnx]”” package but it failed to install, which seems to confirm my understanding that deployment with ONNX runtime is only available on Windows, and not embedded Linux.

We can also install the QAI Hub model zoo, as follows:

pip3 install qai_hub_models

Before using the QAI Hub client, you will need to setup an account:

https://workbench.aihub.qualcomm.com/account/

With an account setup, you will find your API token here:

https://workbench.aihub.qualcomm.com/

Then enter your credentials once using the following command:

qai-hub configure --api_token {API_TOKEN}

As a sanity check, you can list the devices supported by the QAI Hub Workbench:

qai-hub list-devices

MediaPipe models on QAI Hub

If you have read the Qualcomm documentation, you will have noticed that they already have MediaPipe models on their QAI Hub.

SO... WHY ARE WE RE-INVENTING THE WHEEL ?

This is not only an excellent question, but a very important point to highlight.

First, at the time I am writing this article, only 1 of the following mediapipe models were supported on the Qualcomm QCS6490:

mediapipe_hand => NOT supported on QCS6490
mediapipe_face => supported on QCS6490
mediapipe_pose => NOT supported on QCS6490

Second, Qualcomm chose to support an older version of the mediapipe models (v0.07), instead of the most recent (v0.10).

MediaPipe timeline : Currently supported models in Qualcomm AI Hub (📷: AlbertaBeef)

This is VERY IMPORTANT to highlight, since major updates were made after the v0.7 version to the palm detection and hand landmark models, specifically for use with gesture and sign recognition:

SignAll SDK: Sign language interface using MediaPipe is now available for developers - Google Developers Blog

Qualcomm, in fact, chose to support a version of the models that was converted to PyTorch by the open-source community:

[Vidur Satija] BlazePalm : vidursatija/BlazePalm
[Matthijs Hollemans] BlazeFace-PyTorch : hollance/BlazeFace-PyTorch
[Zak Murez] MediaPipePyTorch : zmurez/MediaPipePytorch

Although zmurez does not divulge the conversion scripts that were used to generate the PyTorch versions of the models, vidrsatija and holland, whose work zmurez builds on, do provide the conversion scripts in the form of Jupyter notebooks.

Unfortunately, these conversions scripts/notebooks only work for the v0.7 version, and not the subsequent versions (believe me, I tried... ).

We can observe the reference to the zmurez/MediaPipePyTorch repository when we run the supported mediapipe_face models on our QCS6490 board:

root@qcs6490-visionai-kit:~#  python -m qai_hub_models.models.mediapipe_face.demo
Note: This demo is running through torch, and not meant to be real-time without dedicated ML hardware.
Use Ctrl+C in your terminal to exit.
mediapipe_pytorch requires repository https://github.com/zmurez/MediaPipePyTorch. Ok to clone? [Y/n] Y
Cloning https://github.com/zmurez/MediaPipePyTorch to /root/.qaihm/models/mediapipe_pytorch/v1/zmurez_MediaPipePyTorch_git...
Done

...

The choice of this outdated model does not make sense to me, other than perhaps at the time of the integration only PyTorch was supported by the Qualcomm AI Stack ?

I found myself in the same situation when I deployed these models to AMD/Xilinx Vitis-AI.

Regardless of the reason, I see an opportunity to take the support for MediaPipe one step further. Since we can convert the TFLite models to ONNX, I propose the following updated flow for the MediaPipe models on QCS6490:

MediaPipe timeline : Proposed flow for supporting latest MediaPipe models (📷: AlbertaBeef)

Challenges of deploying MediaPipe with Qualcomm

The first challenge that I encountered, in part 1, was the reality that the performance of the MediaPipe models significantly degrades when run on embedded platforms, compared to modern computers. This is the reason I am attempting to accelerate the models with the QAI Hub Workbench.

The second challenge is the fact that Google does not provide the dataset that was used to train the MediaPipe models. Since quantization requires a subset of this training data, this presents us with the challenge of coming up with this data ourselves.

In order to tackled these challenges, we will clone the following repository (blaze_tutorial), which will be used to quantize, compile, and profile the models in the clould with QAI Hub Workbench:

git clone --branch qcs6490 https://github.com/AlbertaBeef/blaze_tutorial

Creating a Calibration Dataset for Quantization

As described previously in the "QAI Hub Workbench Overview" section, the quantization phase requires several hundreds to thousands of data samples, ideally a subset from the training data. Since we do not have access to the training dataset, we need to come up with this data ourselves.

We can generate the calibration dataset using a modified version of the blaze_app_python.py script, as follows:

gen_calib_hand_dataset.py (📷: AlbertaBeef)

For each input image that contains at least one hand, we want to generate:

palm detection input images : resized image and padded to model's input size
hand landmarks input images : cropped image of each hand, resized to model's input size

Two possible sources for input images are the following:

Kaggle : many datasets exist, and may be reused
Pixabay : contains several interesting videos, from which images can be extracted

For the case of Kaggle, if we take an existing dataset such as the following:

[Kaggle] Hand Gestures Dataset (by Ritika Giridhar)

We can create a modified version of the blaze_detect_live.py script (from the blaze_app_python repository) that will scan all the images and generate a NumPy-specific binary format (*.npy) file containing our calibration data for the quantization step:

blaze_app_python/calib_dataset_kaggle/gen_calib_hand_dataset.py

To run this script, navigate to the "blaze_app_python/calib_dataset_kaggle" directory, download the kaggle dataset to this sub-directory, and launch the script as follows:

$ python3 gen_calib_hand_dataset.py 
[INFO]  2167  images found in  kaggle_hand_gestures_dataset
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[INFO] calib_palm_detection_192_dataset shape =  (1871, 192, 192, 3) uint8 0 255
[INFO] calib_palm_detection_256_dataset shape =  (1871, 256, 256, 3) uint8 0 255
[INFO] calib_hand_landmark_224_dataset shape =  (1880, 224, 224, 3) uint8 0 255
[INFO] calib_hand_landmark_256_dataset shape =  (1880, 256, 256, 3) uint8 0 255

This will create the following calibration data for the 0.10 versions of the palm detection and hand landmarks models:

calib_palm_detection_192_dataset.npy : 1871 samples of 192x192 RGB images
calib_hand_landmark_224_dataset.npy : 1880 samples of 224x224 RGB images

I ultimately decided to not use this dataset, but documented the process for reference, which can be applied to any other Kaggle dataset.

If we take the case of PixaBay, we can use several videos as source such as the following:

Once again, we can create a modified version of the blaze_detect_live.py script (from the blaze_app_python repository) that will scan through the videos and generate a NumPy-specific binary format (*.npy) file containing our calibration data for the quantization step:

blaze_app_python/calib_dataset_pixabay/gen_calib_hand_dataset.py

To run this script, navigate to the "blaze_app_python/calib_dataset_pixabay" directory, download the PixaBay videos in a "videos" sub-sub-directory, and launch the script as follows:

$ python3 gen_calib_hand_dataset.py 
[INFO] Start of video  ./videos/pixabay-sign-language-people-inclusion-58301.mp4
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[INFO] End of video  ./videos/pixabay-sign-language-people-inclusion-58301.mp4

[INFO] Collected  228  images for calib_palm_detection_192_dataset
[INFO] Collected  228  images for calib_palm_detection_256_dataset
[INFO] Collected  336  images for calib_hand_landmark_224_dataset
[INFO] Collected  336  images for calib_hand_landmark_256_dataset

[INFO] Start of video  ./videos/pixabay-sign-language-people-inclusion-58302.mp4
[INFO] End of video  ./videos/pixabay-sign-language-people-inclusion-58302.mp4

[INFO] Collected  694  images for calib_palm_detection_192_dataset
[INFO] Collected  694  images for calib_palm_detection_256_dataset
[INFO] Collected  1127  images for calib_hand_landmark_224_dataset
[INFO] Collected  1127  images for calib_hand_landmark_256_dataset

[INFO] Start of video  ./videos/pixabay-man-living-room-faces-expression-136253.mp4
[INFO] End of video  ./videos/pixabay-man-living-room-faces-expression-136253.mp4

[INFO] Collected  1041  images for calib_palm_detection_192_dataset
[INFO] Collected  1041  images for calib_palm_detection_256_dataset
[INFO] Collected  1818  images for calib_hand_landmark_224_dataset
[INFO] Collected  1818  images for calib_hand_landmark_256_dataset

[INFO] Start of video  ./videos/pixabay-man-face-expression-irritated-182353.mp4
[INFO] End of video  ./videos/pixabay-man-face-expression-irritated-182353.mp4

[INFO] Collected  1138  images for calib_palm_detection_192_dataset
[INFO] Collected  1138  images for calib_palm_detection_256_dataset
[INFO] Collected  1933  images for calib_hand_landmark_224_dataset
[INFO] Collected  1933  images for calib_hand_landmark_256_dataset

[INFO] Start of video  ./videos/pixabay-hands-good-accept-vote-ok-gesture-168344.mp4
[INFO] End of video  ./videos/pixabay-hands-good-accept-vote-ok-gesture-168344.mp4

[INFO] Collected  1361  images for calib_palm_detection_192_dataset
[INFO] Collected  1361  images for calib_palm_detection_256_dataset
[INFO] Collected  2379  images for calib_hand_landmark_224_dataset
[INFO] Collected  2379  images for calib_hand_landmark_256_dataset

[INFO] Start of video  ./videos/pixabay-girl-heart-gesture-symbol-asian-129421.mp4
[INFO] End of video  ./videos/pixabay-girl-heart-gesture-symbol-asian-129421.mp4

[INFO] Collected  1577  images for calib_palm_detection_192_dataset
[INFO] Collected  1577  images for calib_palm_detection_256_dataset
[INFO] Collected  2595  images for calib_hand_landmark_224_dataset
[INFO] Collected  2595  images for calib_hand_landmark_256_dataset

[INFO] calib_palm_detection_192_dataset shape =  (1577, 192, 192, 3) uint8 0 255
[INFO] calib_palm_detection_256_dataset shape =  (1577, 256, 256, 3) uint8 0 255
[INFO] calib_hand_landmark_224_dataset shape =  (2595, 224, 224, 3) uint8 0 255
[INFO] calib_hand_landmark_256_dataset shape =  (2595, 256, 256, 3) uint8 0 255

This will create the following calibration data for the 0.10 versions of the palm detection and hand landmarks models:

calib_palm_detection_192_dataset.npy : 1577 samples of 192x192 RGB images
calib_hand_landmark_224_dataset.npy : 2595 samples of 224x224 RGB images

You are free to use either source described above, or use your own source as data for the quantization phase.

I have archived my exploration on this sub-topic (creating hand/face/pose datasets for various versions of models) in the following two archives:

Kaggle : calib_dataset_kaggle.zip
Pixabay : calib_dataset_pixabay.zip

For the purpose of this exploration, I have prepared the calibration data (from Pixabay), which can be downloaded and extracted as follows:

cd blaze_tutorial/qcs6490
wget https://github.com/AlbertaBeef/blaze_tutorial/releases/download/qcs6490_version_1/blaze_calibration_data.zip
unzip blaze_calibration_data.zip

Model Conversion

The second step to prepare for deployment with QAI Hub Workbench, is to download the TFLite models, and convert them to ONNX with the tf2onnx utility:

get_tflite_models.sh : download TFLite models from Google
convert_models.sh : convert models ONNX format with tf2onnx utility

Converting TFLite models to ONNX (📷: AlbertaBeef)

Use the following commands to download and convert the mediapipe models to ONNX:

cd blaze_tutorial/qcs6490
cd models
source ./get_tflite_models.sh

pip3 install tf2onnx
source ./convert_models.sh

Model Deployment

Now that we have our calibration data, and our models converted to ONNX, we can perform the model quantization, profiling, and compilation using AI Hub Workbench.

I have prepared a script for this purpose:

qai_hub_workbench_flow.py

This script takes three (3) arguments when invoked:

name : model name (ie. palm_detection_lite)
model : model file (ie. models/palm_detection_lite.onnx)
resolution : input size (ie. 256)

The name argument indicates which model we are deploying, such as palm_detection_lite or palm_detection_full for the palm detector, or hand_landmark_lite or hand_landmark_full for the hand landmark models. The resolution indicates the input size to the model.

These two arguments will determine which calibration dataset to use for the quantization. For example:

name=palm_detection_lite, size=192 => calib_palm_detection_192_dataset.npy
name=hand_landmark_lite, size=224 => calib_hand_landmark_224_dataset.npy

The script will generate output artifacts for the following target runtimes:

tflite (*.tflite)
onnx (*.onnx.zip)
qnn_dlc (*.dlc)
qnn_context_binary (*.bin)
precompiled_qnn_onnx (*.onnx.zip)

I will only test inference with the following two target runtimes:

tflite (*.tflite) => Using TFLite (with QNN Delegate)
qnn_context_binary (*.bin) => using QAIRT

I have provided a second script which will call the qai_hub_workbench_flow.py script to quantize, compile, and profile the models:

deploy_models_qai_hub_workbench.sh

You will want to modify the following list before execution:

model_list : specify which model(s) you want to deploy

Below is a modified version of the script that will deploy the 0.10 versions of the palm detection and hand landmarks models.

# ONNX models
model_palm_detector_v0_07=("palm_detection_v0_07","models/palm_detection_v0_07.onnx",256)
model_hand_landmark_v0_07=("hand_landmark_v0_07","models/hand_landmark_v0_07.onnx",256)

model_palm_detector_v0_10_lite=("palm_detection_lite","models/palm_detection_lite.onnx",192)
model_palm_detector_v0_10_full=("palm_detection_full","models/palm_detection_full.onnx",192)
model_hand_landmark_v0_10_lite=("hand_landmark_lite","models/hand_landmark_lite.onnx",224)
model_hand_landmark_v0_10_full=("hand_landmark_full","models/hand_landmark_full.onnx",224)

model_face_detector_v0_10_short=("face_detection_short_range","models/face_detection_short_range.onnx",128)
model_face_detector_v0_10_full=("face_detection_full_range","models/face_detection_full_range.onnx",192)
model_face_landmark_v0_10=("face_landmark","models/face_landmark.onnx",192)

model_pose_detector_v0_10=("pose_detection","models/pose_detection.onnx",224)
model_pose_landmark_v0_10_lite=("pose_landmark_lite","models/pose_landmark_lite.onnx",256)
model_pose_landmark_v0_10_full=("pose_landmark_full","models/pose_landmark_full.onnx",256)
model_pose_landmark_v0_10_heavy=("pose_landmark_heavy","models/pose_landmark_heavy.onnx",256)

model_list=(
	model_palm_detector_v0_10_full[@]
	model_hand_landmark_v0_10_full[@]
	model_palm_detector_v0_10_lite[@]
	model_hand_landmark_v0_10_lite[@]
)

model_count=${#model_list[@]}
#echo $model_count


for ((i=0; i<$model_count; i++))
do
	model=${!model_list[i]}
	model_array=(${model//,/ })
	model_name=${model_array[0]}
	model_file=${model_array[1]}
	input_resolution=${model_array[2]}

	echo python3 qai_hub_workbench_flow.py --name ${model_name} --model ${model_file} --resolution ${input_resolution}

	python3 qai_hub_workbench_flow.py --name ${model_name} --model ${model_file} --resolution ${input_resolution} | tee deploy_${model_name}.log

done

This script can be executed as folllows:

cd blaze_tutorial/qcs6490

source ./deploy_models_qai_hub_workbench.sh

When complete, the following compiled models will be located in the current directory:

palm_detection_full.tflite, palm_detection_full.bin,...
hand_landmarks_full.tflite, hand_landmarks_full.bin,...
palm_detection_lite.tflite,, palm_detection_lite.bin,...
hand_landmarks_lite.tflite, hand_landmarks_lite.bin, …

For convenience, I have archived the compiled models for QCS6490 in the following archives:

TFLite models (*.tflite) : blaze_tflite_qnn_models_qcs6490.zip
QAIRT models (*.bin) : blaze_qairt_models_qcs6490.zip

Analyzing Results in QAI Hub Workbench

All of the results will be located in your QAI Hub Workbench account on-line:

https://workbench.aihub.qualcomm.com

On the "Jobs" page, if we click on the "Profile" tab, we can see our profiling results for each of the models:

Profile results for palm_detection_full & hand_landmarks_full (📷: Qualcomm)

If we compare the unquantized ONNX model running on CPU with the quantized models running on NPU:

palm_detection_full (QAIRT version) : 66.4 msec => 1.3 msec
hand_landmarks_full (QAIRT version) : 47.4 msec => 1 msec

Profile results for palm_detection_lite & hand_landmarks_lite (📷: Qualcomm)

If we compare the unquantized ONNX model running on CPU with the quantized models running on NPU:

palm_detection_lite (QAIRT version) : 54.7 msec => 1.2 msec
hand_landmarks_lite (QAIRT version) : 29.9 msec => 0.7 msec

This is significant acceleration (between 30X and 60X) !

There is one anomaly in the profile results for the quantized models:

palm_detection_full (TFlite version) : 66.4 msec => 31.0 msec

If we click on the job to understand what is happening, we can see that the model is being only partially accelerated on the NPU, with 150 layers still being executed on the CPU.

Profile job for palm_detection_full_target_runtime_tflite (📷: Qualcomm)

All of the other jobs have a clean NPU implementation, including the QAIRT version of the palm_detection_full model:

Profile job for palm_detection_full_target_runtime_qnn_context_binary (📷: Qualcomm)

If we scroll down and select the "Runtime Layer Analysis" section, we can click on the "VIEW OPTRACE" button to get a detailed layer by layer profile:

OpTrace for palm_detection_full_target_runtime_qnn_context_binary (📷: Qualcomm)

If we look at the number of layers (CPU, NPU,, GPU) for each model, we get the following reported layers in the profile jobs:

Model Deployment Results (📷: AlbertaBeef)

The "ONNX Layers" corresponds to the floating-point ONNX model that we used as input for the QAI Hub Workbench.

The "TFLite Layers" corresponds to the quantized model, targetted to the TFLite runtime.

The "QAIRT Layers" corresponds to the quantized model, targetted to "qnn_compiled_binary", which can be used with the Qualcomm AI runtime.

Model Accuracy

If we look at model accuracy, we get the following results in the quantize jobs:

Model Accuracy Results (📷: AlbertaBeef)

We can see that the model accuracy still needs work. This may be related to the calibration data, but I have not investigated further.

The best accuracy is being achieved with version 0.07 of the palm detection model.

The worst accuracy is being achieved with version 0.07 of the hand landmarks model. This can be seen with version 0.07 of the pipeline, where the hands are detected correctly, but the landmarks are not very accurate.

Unfortunately, the PSNR calculation for the v0.10 version of the hand landmark models failed, so I do not know what the metrics are, but they definitely look accurate, so I'll assume they are above 30 dB.

Except for the handedness, which is wrong (always reporting "left" hands).

Model Execution

In order to support the QCS6490 models, the "blaze_app_python" application was augmented with the following inference targets:

blaze_app_python : support for QCS6490 (📷: AlbertaBeef)

As can be seen, I already have support for the original TFLite models, as well as the PyTorch versions (v0.07) of the models.

We can also execute the unquantized ONNX models, but most importantly, support for execution on the NPU was added using the following two runtime targets:

TFLite (using QNN Delegate)
QAIRT

My final inference code for the TFLite (with QNN Delegate), actually makes use of the EdgeAI Lite-RT (latest version of TFLite from Google), and can be found in the "blaze_app_python" repository, under the blaze_tflite_qnn sub-directory:

We need to ensure that the required library is present on our board, which it is:

root@qcs6490-visionai-kit:~# ls /usr/lib/libQnnTFLiteDelegate.so
/usr/lib/libQnnTFLiteDelegate.so

My final inference code for QAIRT can be found in the "blaze_app_python" repository, under the blaze_qairt sub-directory:

We need to install the QAIRT SDK on our board, which can be done using the following instructions.

First, we download and install version 2.40 of the QAIRT SDK:

export PRODUCT_SOC=6490 DSP_ARCH=68

wget https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.40.0.251030/v2.40.0.251030.zip

unzip v2.40.0.251030.zip
cd qairt/2.40.0.251030
source bin/envsetup.sh

export ADSP_LIBRARY_PATH=$QNN_SDK_ROOT/lib/hexagon-v${DSP_ARCH}/unsigned
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$QNN_SDK_ROOT/lib/aarch64-oe-linux-gcc11.2

Then, we [optionally] clone, build, and install the QAI App Builder:

git clone --branch v2.40.0 https://github.com/quic/ai-engine-direct-helper--recursive
cd ai-engine-direct-helper

python3 setup.py bdist_wheel
pip3 install dist/qai_appbuilder-2.38.0-cp312-cp312-linux_aarch64.whl

Installing the python application on the Vision AI-KIT 6490

First, let's ensure that our changes are persistent:

mount -o remount,rw /usr

The python demo application requires certain packages which can be installed as follows:

pip3 install ai-edge-litert

The python application can be accessed from the following github repository:

git clone --branch blaze_qnn https://github.com/AlbertaBeef/blaze_app_python
cd blaze_app_python

In order to successfully use the python demo with the original TFLite models, they need to be downloaded from the google web site:

cd blaze_tflite/models
source ./get_tflite_models.sh
cd ../..

In order to successfully use the python demo with the QCS6490 models, they need to be downloaded as follows:

cd blaze_tflite_qnn/models
source ./get_qcs6490_models.sh
unzip blaze_tflite_qnn_models_qcs6490.zip
cd ../..

cd blaze_qairt/models
source ./get_qcs6490_models.sh
unzip blaze_qairt_models_qcs6490.zip
cd ../..

You are all set !

Launching the python application on the Vision AI-KIT 6490

As we already saw in part 1, the python application can launch many variations of the dual-inference pipeline, which can be filtered with the following arguments:

--blaze : hand | face | pose
--target : blaze_tflite |... | blaze_tflite_qnn | blaze_qairt
--pipeline : specific name of pipeline (can be queried with --list argument)

In order to display the complete list of supported pipelines, launch the python script as follows:

root@zub1cg-sbc-2022-2:~/blaze_app_python# python3 blaze_detect_live.py --list
[INFO] user@hosthame :  root@zub1cg-sbc-2022-2
[INFO] blaze_tflite supported ...
...
[INFO] blaze_qairt supported ...
...
Command line options:
 --input       :
 --image       :  False
 --blaze       :  hand,face,pose
 --target      :  blaze_tflite,...,blaze_qairt
 --pipeline    :  all
 --list        :  True
 --debug       :  False
 --withoutview :  False
 --profilelog  :  False
 --profileview  :  False
 --fps         :  False

List of target pipelines:
...
## qairt_hand_v0_10_lite     blaze_qairt/models/palm_detection_lite.bin
                             blaze_qairt/models/hand_landmark_lite.bin
## qairt_hand_v0_10_full     blaze_qairt/models/palm_detection_full.bin
                             blaze_qairt/models/hand_landmark_full.bin
...

In order to launch the v0.10 lite version of the pipeline for hand detection and landmarks, and the TFLite runtime with QNN delegate, use the python script as follows:

python3 blaze_detect_live.py --pipeline=tflqnn_hand_v0_10_lite --fps

This will launch the 0.10 (lite) version of the models, compiled for QCS6490, as shown below:

python3 blaze_detect_live.py --pipeline=tflqnn_hand_v0_10_lite (📹 : AlbertaBeef)

The previous video has not been accelerated. It shows the frame rate to be approximately 30 fps when no hands are detected (one model running : palm detection), approximately 20 fps when one hand has been detected (two models running : palm detection and hand landmarks), and approximately 15 fps when two hands have been detected (three models running : palm detection and 2 hand landmarks).

This is slightly worse than the original TFLite models running on the CPU, so I have to consider the use of TFLite with QNN delegate unsuccessful for this use case.

In order to launch the v0.10 lite version of the pipeline for hand detection and landmarks, and the Qualcomm AI runtime, use the python script as follows:

python3 blaze_detect_live.py --pipeline=qairt_hand_v0_10_lite --fps

This will launch the 0.10 (lite) version of the models, compiled for QCS6490, as shown below:

python3 blaze_detect_live.py --pipeline=qairt_hand_v0_10_lite (📹 : AlbertaBeef)

The previous video has not been accelerated. It shows the frame rate to be 30 fps when no hands are detected (one model running : palm detection), as well as when one hand has been detected (two models running : palm detection and hand landmarks), and when two hands have been detected (three models running : palm detection and 2 hand landmarks).

Contrary to the TFLite runtime with QNN delegate, the Qualcomm AI runtime is achieving significant acceleration !

I have created a montage of the TFLite reference, the TFLite with QNN delegate, and QAIRT, for comparison:

Comparison : tfl_hand_v0_10_lite, tflqnn_hand_v0_10_lite, qairt_hand_v0_10_lite (📹 : AlbertaBeef)

In order to know the true performance of the models running with QAIRT, we will need to detach from the USB camera (which is determining the frame rate of 30fps). We will be doing this in the next section.

Benchmarking the models on the Vision AI-KIT 6490

In order to obtain stable profile results, we used a test image (with two hands) that can be downloaded from Google as follows:

source ./get_test_images.sh

We can visualize the profiling results between the original TFLite models and the QAIRT accelerated models using the following command:

python3 blaze_detect_live.py --testimage --pipeline=tfl_hand_v0_10_full,tfl_hand_v0_10_lite,qairt_hand_v0_10_full,qairt_hand_v0_10_lite, --profileview

The following graphs will appear:

blaze_app_python : Latency (📷: AlbertaBeef)

blaze_app_python : Performance (📷: AlbertaBeef)

You may have noticed that the bars in the graphs have some jitter, so we will capture a series of values to CSV file, then average the results, to get a better idea of the performance.

I do not have this automated, so it is a manual process of capture, process, visualize...

The following commands can be used to generate profile results for the qairt_hand_v0_10_lite pipeline using the QCS6490 models, and the test image:

rm blaze_detect_live.csv
python3 blaze_detect_live.py--testimage --withoutview --profilelog  --pipeline=qairt_hand_v0_10_lite 
mv blaze_detect_live.csv blaze_detect_live_qcs6490_qairt_hai_hand_v0_10_lite.csv

The following commands can be used to generate profile results for the tfl_hand_v0_10_lite pipeline using the TFLite models, and the test image:

rm blaze_detect_live.csv
python3 blaze_detect_live.py --testimage --withoutview --profilelog --pipeline=tfl_hand_v0_10_lite 
mv blaze_detect_live.csv blaze_detect_live_qcs6490_tfl_hand_v0_10_lite.csv

The same is done for the qairt_hand_v0_10_full & tfl_hand_v0_10_full models.

The results of all.csv files were averaged, then plotted using Excel.

Here are the profiling results for the models deployed with QAIRT, in comparison to the reference TFLite models:

Latency Benchmarks - QAIRT versus TFLite Inference (📷: AlbertaBeef)

QAIRT Acceleration - Palm Detection + Hand Landmarks

Again, it is worth noting that these benchmarks have been taken with a single-threaded python script. There is additional opportunity for acceleration with a multi-threaded implementation. While the graph runner is waiting for transfers from one model's sub-graphs, another (or several other) model(s) could be launched in parallel...

There is also an opportunity to accelerate the rest of the pipeline with C++ code...

Known Issues

Although I have quantized and deployed the v0.07 version of the palm_detection and hand_landmarks models, the accuracy of the hand_landmarks model has degraded, so do not re-use this one in your application.

For the v0.10 lite version of the hand landmarks, the handedness does not seem to have been correctly handled. The models are always returning ~1.0, corresponding to "left" hand.

In summary, only the v0.10 full version of the palm detection and hand landmark models are fully functional, so use those.

Conclusion

Although the on-line QAI Hub Workbench works well, I doubt serious customers with real-projects will accept to upload their proprietary models and datasets to the Qualcomm cloud.

Are you using Qualcomm Dragonwing devices in your projects ?

Let me know if the comments...

Acknowledgements

I need to acknowledge the extraordinary work of my colleagues, which made the project possible:

Monica Houston : for her work on the quantized TFLite implementation, which I was able to quickly adapt for the TFLite with QNN Delegate implementation
Maxim Saka : for the immense work involved in creating the QIRP images, and creating the GTK user interface for the OOB demo, which was very useful during this investigation

Version History

2026/01/19 - Initial Version

References

Accelerating MediaPipe (by Mario Bergeron):

Hackster Series Part 1 : Blazing Fast Models
Hackster Series Part 2 : Insightful Datasets for ASL recognition
Hackster Series Part 3 : Accelerating the MediaPipe models with Vitis-AI 3.5
Hackster Series Part 4 : Accelerating the MediaPipe models with Hailo-8
Hackster Series Part 5 : Accelerating the MediaPipe models on RPI5 AI Kit
Hackster Series Part 6 : Accelerating the MediaPipe models with MemryX
Hackster Series Part 7 : Accelerating the MediaPipe models with Qualcomm
Hackster Series Part 8 : Accelerating the MediaPipe models with AzurEngine
Blaze Utility (python version) : blaze_app_python
Blaze Utility (C++ version) : blaze_app_cpp

Qualcomm References:

QAI Hub : https://aihub.qualcomm.com/get-started
QAI Hub Workbench : https://aihub.qualcomm.com/get-started#workbench
Sample python inference application using LiteRT QNN Delegate : https://github.com/quic/sample-apps-for-qualcomm-linux/blob/main/qualcomm-linux/applications/LiteRT/object_detection.py
Sample python inference application using AI Engine Direct : https://github.com/quic/ai-engine-direct-helper/tree/main/samples/python/mediapipe_hand

MediaPipe resources

[Google] MediaPipe Solutions Guide : https://ai.google.dev/edge/mediapipe/solutions/guide
[Google] MediaPipe Source Code : https://github.com/google-ai-edge/mediapipe
[Google] SignALL SDK : https://developers.googleblog.com/en/signall-sdk-sign-language-interface-using-mediapipe-is-now-available-for-developers/

Open-Source Porting Effort to PyTorch

[Vidur Satija] BlazePalm : vidursatija/BlazePalm
[Matthijs Hollemans] BlazeFace-PyTorch : hollance/BlazeFace-PyTorch
[Zak Murez] MediaPipePyTorch : zmurez/MediaPipePytorch

Mario Bergeron

68 projects • 319 followers

Mario Bergeron specializes in embedded vision, machine learning, and robotics.

Accelerating the MediaPipe models with Qualcomm

Things used in this project

Hardware components

Story

Introduction

Qualcomm AI Hub Workbench

Starting with QIRP 1.6 image on the Vision AI-KIT 6490

Installing QAI Hub on the Vision AI-KIT 6490

MediaPipe models on QAI Hub

Challenges of deploying MediaPipe with Qualcomm

Creating a Calibration Dataset for Quantization

Model Conversion

Model Deployment

Analyzing Results in QAI Hub Workbench

Model Accuracy

Model Execution

Installing the python application on the Vision AI-KIT 6490

Launching the python application on the Vision AI-KIT 6490

Benchmarking the models on the Vision AI-KIT 6490

Known Issues

Conclusion

Acknowledgements

Version History

References

Credits

Mario Bergeron

Comments

Embed the widget on your own site

Accelerating the MediaPipe models with Qualcomm

Accelerating the MediaPipe models with Qualcomm

Things used in this project

Hardware components

Story

Introduction

Qualcomm AI Hub Workbench

Starting with QIRP 1.6 image on the Vision AI-KIT 6490

Installing QAI Hub on the Vision AI-KIT 6490

MediaPipe models on QAI Hub

Challenges of deploying MediaPipe with Qualcomm

Creating a Calibration Dataset for Quantization

Model Conversion

Model Deployment

Analyzing Results in QAI Hub Workbench

Model Accuracy

Model Execution

Installing the python application on the Vision AI-KIT 6490

Launching the python application on the Vision AI-KIT 6490

Benchmarking the models on the Vision AI-KIT 6490

Known Issues

Conclusion

Acknowledgements

Version History

References

Credits

Mario Bergeron

Comments

Related channels and tags