Overview
Eye State Classification is a crucial task in industries like photography, automotive safety, and surveillance. Photographers often discard images with closed eyes manually, while driver drowsiness poses severe risks, potentially leading to accidents. Despite the critical need, real-time systems for classifying eye states are limited.
Motivation
Traditional algorithmic approaches that use landmark detection struggle with variability in demographics, angles and lighting. This project proposes an automated, real-time solution capable of reliably classifying eye states across diverse conditions deployed on RyzenAI.
Impact
Accurately classifying eye states can help prevent accidents due to driver drowsiness (or falling asleep) and save photographers time, allowing them to work on their pictures rather than sorting through them.
Data and PreprocessingDataset Trimming
The OACE dataset contains images of open and closed eyes from multiple subjects, under different angles and lighting. The dataset has extra images of a single subject added to it (identified by its unique uuid naming, different from the other images) which contain distortions. These images were removed, reducing the dataset size by half. Next, 10,000 random images (5k from open and 5k from close) were selected and used for the project.
Preprocessing
To ensure that the model is able to generalize and perform well in a real-world scenario, several preprocessing steps were applied.
The images are resized to 224x224 to stay consistent with the pretrained model's ImageNet weights
train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomRotation(10), # rotate images
transforms.ColorJitter(0.2, 0.2, 0.2, 0.1), # change lighting
transforms.GaussianBlur(1), # blur images
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
Find the original dataset here. Find the trimmed dataset here.
PyTorch ModelsTwo models, MobileNetV2 and MobileNetV3 Large, were utilized for this task (To be compared later in terms of performance). Both models were pretrained on ImageNet1K_V2 and sourced from PyTorch. The training strategy involved freezing the base layers and fine-tuning only the classifier layers, as shown below:
for name, param in model.named_parameters():
if "classifier" in name:
param.requires_grad=True
else:
param.requires_grad=False
Training
The models were fine-tuned on the trimmed dataset for 10 epochs using the following hyperparameters:
criterion = nn.CrossEntropyLoss() # to learn both classes
optimizer = optim.SGD(model.parameters(), lr=0.0005, momentum=0.9, weight_decay=1e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer=optimizer, step_size=0.3, gamma=0.1)
Results
Post training, both models were validated using a separate test set ('test_loader')
- MobileNetV2 achieved an accuracy of 96.50%
- MobileNetV3 achieved an accuracy of 97%
ONNX and Quantization
After training, the models were converted to the ONNX format with opset version 13 and quantized to int8 using the Vitis AI plugin with QDQ format. Calibration was performed using 500 images, which could lead to an increase in accuracy due to the smaller calibration set size.
Results
Post Quantization, both models were validated using a separate test set ('ipu_test_loader')
- Quantized MobileNetV2 achieved an accuracy of 99%
- Quantized MobileNetV3 achieved an accuracy of 55%
Note: More sophisticated methods of quantization might be required for the MobileNetV3 model since linear quantization results in a steep drop in accuracy.
Inference
Finally, the quantized models are ready for real-time inference via webcam, or with static images.
First, it should be mentioned that this project supports ryzenai 1.1. It is imperative to install the NPU drivers and set up your conda environment.
Follow instructions to setup NPU here (only follow it till "Install NPU Drivers"). Next move into ryzen-ai-sw-1.1 and run the command
install.bat
to install the requisite packages and create your conda environment.
Run Inference Out-Of-The-Box
1. Clone the repository
git clone https://github.com/SrivathsanSivakumar/Eye-State-Detection-with-RyzenAI
2. Open Anaconda Prompt and move into the repository. Make sure to activate your conda env!
3. Run the following command to install the necessary packages and to get the dataset
pip install requirements.txt
Then
python get_dataset.py
4. Next make sure to validate the accuracy of the quantized model with:
python quantize_model.py --test_only
5. Run the following command for real-time webcam inference:
python webcam_inference.py
This uses the onnx model quantized and converted to run with AMD's Ryzen AI chip.
6. If you do not have access to a webcam, or you wish to run inference using static images, you can do so with:
python static_images_inference.py
To test with a custom image use command
python static_images_inferece.py --custom <your image path>
Detailed Guide with Command Options for Full Project
This section describes each file, the default command to run them and extra command options in that order.
Make sure to install necessary packages with:
pip install requirements.txt
get_dataset.py
Description:
This script fetches the dataset from google drive and extracts it to the "data" folder. There is only one command for this file.
Command:
python get_dataset.py
There are no other command options for this file.
prepare_model_and_data.py
Description:
This file allows you to either load a fine-tuned PyTorch model or initialize a fresh model and train it. After one of two scenarios the model is tested then converted to ONNX format.
Default Command:
python prepare_model_and_data.py
This command will by default retrieve the dataset (if it is not already present), load the fine-tuned MobileNetV2 model and test the model to provide accuracy and loss. Then the model is exported to ONNX format.
Command Options:
-model
: Specify which model you want between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default-train
Use this flag if you want to train a freshly loaded model--num_epochs
specify the number of epochs you want to train the model for. 1 Epoch is set by default
Example Command With Full Options
python prepare_model_and_data.py -model mobilenetv3 -train --num_epochs 10
quantize_model.py
Description:
This file uses Vitis AI to static quantize the model to QDQ format with QUInt8 Activation then the quantized model is tested to get accuracy.
Default Command:
python quantize_model.py
Command Options:
-model
Specify which model you chose in the previous script between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default--test_only
pass this argument to only validate the quantized model
Example Command with Full Options
python quantize_model.py -model mobilenetv3 --test_only
webcam_inference.py, static_images_inference.py
Description:
webcam_inference.py runs the quantized model to run inference in real-time via webcam, and static_images_inference.py runs the quantized model on images (some images are already loaded with the repository).
Default Command:
python webcam_inference.py
or
python static_images_inference.py
Command Options:
-model
Specify which model you chose in the previous script between "mobilenetv2" and "mobilenetv3". MobileNetV2 is selected by default
For static_images_inference.py there is an extra command:
--image
run inference using a custom image.
Example Command with Full Options
python webcam_inference.py -model mobilenetv3
or
python static_images_inference.py -model mobilenetv3 --image <path to image>
Conclusion
The project showcases the effectiveness of CNNs in validating the capabilities of Ryzen AI chip for real-time applications.
Comments