Optical Character Recognition (OCR) is one of my favorite use cases for computer vision. There is something that fascinates me about a machine "reading" the physical world. Once I saw this Edge Impulse project doing Model Cascading to enable OCR in a Python application I couldn't resist myself to port it to the Arduino UNO Q.
On the Arduino UNO Q, a unique OCR model can be slow and resource-heavy. But by using Model Cascading, we can optimize the process. We use a lightweight "detector" model to find text first, and only invoke the "heavy" reader model to recognize it when necessary. This isn't just a demo; it's a blueprint for efficient Edge AI that can be used with more use cases.
Model Cascading ExplainedThe logic follows a simple conditional execution flow:
- Inference A (Detector): Runs at high FPS. Output is a bounding box.
- If confidence is high enough, proceed to Inference B.
- Inference B (Recognizer): Runs only on the cropped area image of the bounding box.
By reducing the input size for the second model, we significantly decrease the number of operations (FLOPs), making the recognition phase much faster than if we scanned the entire high-res image.
Having said that, probably the Arduino UNO Q with a Qualcomm Dragonwing QRB2210 is too slow to handle this heavy inference process in real-time.
Building the Models with Edge ImpulseNote: Prebuilt models for Apple-silicon macOS, aarch64 Linux boards and aarch64 Linux boards w/ Qualcomm QNN optimizations (e.g. Rubik Pi, RB3 Gen 2 Vision Kit) are in models/.In case you want to create your own models for this project and not using the ones in the repository follow the instructions below.
We used Edge Impulse Studio to deploy the Hugging Face PaddleOCR ONNX models to the Arduino UNO Q.
We pulled the PaddleOCR Detector model with the pretrained weights from Hugging Face and imported them into Edge Impulse using Bring Your Own Model (BYOM).
Once you have the `onnx` file in your computer, go to the Edge Impulse Studio and create a new Project, the PaddleOCR detector project. Then click in the Dashboard, `Upload Your Model`.
On the 'Step 1: Upload pretrained model' screen and select the detection `onnx` model that you downloaded.
Under "Set input shape for ONNX file" set 1, 3, 480, 640. You can change the height and width if you want higher/lower resolution. I tested 320x240.
Optional (to quantize the model): Under "Upload representative features" select source_models/repr_dataset_480_640.npy (from this repo). If you need another ressolution you will need to run:
# 1) create a new venv, and install dependencies in source_models/requirements.txt
# e.g. on macOS/Linux via 'cd source_models && python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt && cd ..'
# 2) download an OpenImages subset
oi_download_images --base_dir=source_models/openimages --labels Car --limit 200
# 2) create a representative dataset from OpenImages 'car' class, scaled -1..1
python3 source_models/create_representative_dataset.py --height 480 --width 640 --limit 30On the 'Step 2: Process "det.onnx"' screen select the Model input `Image` and the scale 'Pixels range -1..1 (not normalized)'.
Under "Model output" select 'Object detection'.
Under "Output layer" select 'PaddleOCR detector'.
You can now upload an image under 'Check model behavior', and optionally tune the thresholds to perfectly match your text (the defaults should be pretty good). And click Save Model.
Then go to Deployment and deploy the model selecting your target hardware there. In my case I selected `Arduino UNO Q`.
Running the OCR project on the Arduino UNO QClone this repository in your Arduino UNO Q (or your Linux / Mac device).
Get into the folder where you cloned the project and prepare to deploy. I recommend to set up a virtual environment to deploy it. You will need at least the Python version 3.10.
python -m venv .venv
source .venv/bin/activateAnd now install the dependencies:
pip install --upgrade pip pyaudio six
pip install -r requirements.txtRun the application that will be accessible via a browser to the local IP address of the device in the port 5000.
python web_inference.py \
--detect-file ./models/arduino-uno-q/detector-linux-aarch64.eim \
--predict-file ./models/arduino-uno-q/recognizer-linux-aarch64.eim \
--dict-file source_models/rec_en_dict.txtYou should be able to see something such as:
(.venv) arduino@Arduino:~/ocr-linux-python$ python3 web_inference.py --detect-file ./models/arduino-uno-q/detector-linux-aarch64.eim --predict-file ./models/arduino-uno-q/recognizer-linux-aarch64.eim --dict-file source_models/rec_en_dict.txt
Loading character dictionary...
Dictionary loaded: 437 characters
Initializing detector model...
Initializing predictor model...
Models loaded:
Detector: Marc using Arduino UNO Q / PaddleOCR detector - pretrained (v2)
Predictor: Marc using Arduino UNO Q / PaddleOCR recognizer - pretrained (v2)
============================================================
Edge Impulse OCR Web Interface
============================================================
Web UI available at:
- http://localhost:5000
- http://<device-ip>:5000
Open the web UI, select a camera, and click "Start Inference"
Press Ctrl+C to stop the server
============================================================
* Serving Flask app 'web_inference'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://192.168.1.131:5000
Press CTRL+C to quitOpen a browser tab with the local IP address of the Arduino UNO Q in the port 5000. Select the camera (in my case `/dev/video2` and you should be able to see:
Looking back at the results, the speed of the infence is very slow however the model cascading improve the OCR process if that would work with a monolithic OCR model.
Take this project as an example of a model cascading pipeline that can apply to any computer vision project.
And this is just the starting point on the Arduino UNO Q, imagine adding a bridge connection between the CPU and the MCU in order to write on the display the recognized messages in the image, or other innovative ideas that you may have.
If you have any questions do not hesitate to ask them away in the Edge Impulse forums or in the Edge Impulse Discord server.
DisclaimerThis project is intended for educational and experimental purposes only. It is not hardened for production use. Do not deploy in industrial or safety-critical environments without proper security, testing, and validation.
AttributionThis project is based in the Two-stage OCR for Linux (Edge Impulse) project which is a Python application.
Big kudos to Ivan!








Comments