Published July 31, 2024

LidWatch - Computer Vision To Classify Eye State

In various contexts such as photography and driver drowsiness, it is import to be able to distinguish when eyes are open or closed.

IntermediateFull instructions provided107

LidWatch - Computer Vision To Classify Eye State

Things used in this project

Hardware components

Minisforum Venus UM790 Pro with AMD Ryzen™ 9

USB Camera

Used to test real-time inference

Software apps and online services

Windows 11

Facenet

Used for face landmark detection

Dlib

For face landmark detection

PyTorch

Used for finetuning, training and testing models

ONNX

Used for: 1. exporting pytorch model to onnx 2. loading onnx model (using onnxruntime)

VitisAI ONNX

Used to quantize model to QDQ format with QUInt8 Activation and QInt8 Weights

gdown

To get dataset from Google Drive

python3.9

python interpreter version used for this project

Story

Introduction

Overview

Eye State Classification is a crucial task in industries like photography, automotive safety, and surveillance. Photographers often discard images with closed eyes manually, while driver drowsiness poses severe risks, potentially leading to accidents. Despite the critical need, real-time systems for classifying eye states are limited.

Motivation

Traditional algorithmic approaches that use landmark detection struggle with variability in demographics, angles and lighting. This project proposes an automated, real-time solution capable of reliably classifying eye states across diverse conditions deployed on RyzenAI.

Impact

Accurately classifying eye states can help prevent accidents due to driver drowsiness (or falling asleep) and save photographers time, allowing them to work on their pictures rather than sorting through them.

Inference on Quantized MobileNetV2 Using Ryzen AI

Data and Preprocessing

Dataset Trimming

The OACE dataset contains images of open and closed eyes from multiple subjects, under different angles and lighting. The dataset has extra images of a single subject added to it (identified by its unique uuid naming, different from the other images) which contain distortions. These images were removed, reducing the dataset size by half. Next, 10,000 random images (5k from open and 5k from close) were selected and used for the project.

Random Images Pulled from the Dataset

Preprocessing

To ensure that the model is able to generalize and perform well in a real-world scenario, several preprocessing steps were applied.

The images are resized to 224x224 to stay consistent with the pretrained model's ImageNet weights

train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomRotation(10),              # rotate images
    transforms.ColorJitter(0.2, 0.2, 0.2, 0.1), # change lighting
    transforms.GaussianBlur(1),                 # blur images
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Find the original dataset here. Find the trimmed dataset here.

PyTorch Models

Two models, MobileNetV2 and MobileNetV3 Large, were utilized for this task (To be compared later in terms of performance). Both models were pretrained on ImageNet1K_V2 and sourced from PyTorch. The training strategy involved freezing the base layers and fine-tuning only the classifier layers, as shown below:

for name, param in model.named_parameters():
    if "classifier" in name:
        param.requires_grad=True
    else:
        param.requires_grad=False

Training

The models were fine-tuned on the trimmed dataset for 10 epochs using the following hyperparameters:

criterion = nn.CrossEntropyLoss() # to learn both classes
optimizer = optim.SGD(model.parameters(), lr=0.0005, momentum=0.9, weight_decay=1e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer=optimizer, step_size=0.3, gamma=0.1)

Results

Post training, both models were validated using a separate test set ('test_loader')

MobileNetV2 achieved an accuracy of 96.50%
MobileNetV3 achieved an accuracy of 97%

ONNX Models

ONNX and Quantization

After training, the models were converted to the ONNX format with opset version 13 and quantized to int8 using the Vitis AI plugin with QDQ format. Calibration was performed using 500 images, which could lead to an increase in accuracy due to the smaller calibration set size.

Results

Post Quantization, both models were validated using a separate test set ('ipu_test_loader')

Quantized MobileNetV2 achieved an accuracy of 99%
Quantized MobileNetV3 achieved an accuracy of 55%

Note: More sophisticated methods of quantization might be required for the MobileNetV3 model since linear quantization results in a steep drop in accuracy.

Inference

Finally, the quantized models are ready for real-time inference via webcam, or with static images.

1 / 2 • Inference on Quantized MobileNetV2 Using Ryzen AI

Steps to Run The Project

First, it should be mentioned that this project supports ryzenai 1.1. It is imperative to install the NPU drivers and set up your conda environment.

Follow instructions to setup NPU here (only follow it till "Install NPU Drivers"). Next move into ryzen-ai-sw-1.1 and run the command

install.bat

to install the requisite packages and create your conda environment.

Run Inference Out-Of-The-Box

1. Clone the repository

git clone https://github.com/SrivathsanSivakumar/Eye-State-Detection-with-RyzenAI

2. Open Anaconda Prompt and move into the repository. Make sure to activate your conda env!

3. Run the following command to install the necessary packages and to get the dataset

pip install requirements.txt

Then

python get_dataset.py

4. Next make sure to validate the accuracy of the quantized model with:

python quantize_model.py --test_only

5. Run the following command for real-time webcam inference:

python webcam_inference.py

This uses the onnx model quantized and converted to run with AMD's Ryzen AI chip.

6. If you do not have access to a webcam, or you wish to run inference using static images, you can do so with:

python static_images_inference.py

To test with a custom image use command

python static_images_inferece.py --custom <your image path>

Detailed Guide with Command Options for Full Project

This section describes each file, the default command to run them and extra command options in that order.

Make sure to install necessary packages with:

pip install requirements.txt

get_dataset.py

Description:

This script fetches the dataset from google drive and extracts it to the "data" folder. There is only one command for this file.

Command:

python get_dataset.py

There are no other command options for this file.

prepare_model_and_data.py

Description:

This file allows you to either load a fine-tuned PyTorch model or initialize a fresh model and train it. After one of two scenarios the model is tested then converted to ONNX format.

Default Command:

python prepare_model_and_data.py

This command will by default retrieve the dataset (if it is not already present), load the fine-tuned MobileNetV2 model and test the model to provide accuracy and loss. Then the model is exported to ONNX format.

Command Options:

-model : Specify which model you want between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default
-train Use this flag if you want to train a freshly loaded model
--num_epochs specify the number of epochs you want to train the model for. 1 Epoch is set by default

Example Command With Full Options

python prepare_model_and_data.py -model mobilenetv3 -train --num_epochs 10

quantize_model.py

Description:

This file uses Vitis AI to static quantize the model to QDQ format with QUInt8 Activation then the quantized model is tested to get accuracy.

Default Command:

python quantize_model.py

Command Options:

-model Specify which model you chose in the previous script between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default
--test_only pass this argument to only validate the quantized model

Example Command with Full Options

python quantize_model.py -model mobilenetv3 --test_only

webcam_inference.py, static_images_inference.py

Description:

webcam_inference.py runs the quantized model to run inference in real-time via webcam, and static_images_inference.py runs the quantized model on images (some images are already loaded with the repository).

Default Command:

python webcam_inference.py

python static_images_inference.py

Command Options:

-model Specify which model you chose in the previous script between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default

For static_images_inference.py there is an extra command:

--image run inference using a custom image.

Example Command with Full Options

python webcam_inference.py -model mobilenetv3

python static_images_inference.py -model mobilenetv3 --image <path to image>

Conclusion

The project showcases the effectiveness of CNNs in validating the capabilities of Ryzen AI chip for real-time applications.

Credits

Sri

1 project • 0 followers

Computer Science Major Passionate about RedBull and AI

LidWatch - Computer Vision To Classify Eye State

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Data and Preprocessing

PyTorch Models

ONNX Models

Steps to Run The Project

Code

Eye-State-Detection-with-RyzenAI

Credits

Sri

Comments

Embed the widget on your own site

LidWatch - Computer Vision To Classify Eye State

LidWatch - Computer Vision To Classify Eye State

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Data and Preprocessing

PyTorch Models

ONNX Models

Steps to Run The Project

Code

Eye-State-Detection-with-RyzenAI

Credits

Sri

Comments

Related channels and tags