Published August 1, 2024 © Apache-2.0

Real-time Object Detection and Classification for Recycling

Ryzen AI-powered application that uses a camera to identify and classify various types of waste

AdvancedWork in progressOver 8 days187

Real-time Object Detection and Classification for Recycling

Things used in this project

Hardware components

Minisforum Venus UM790 Pro with AMD Ryzen™ 9

Software apps and online services

AMD Ryzen™ AI

DroidCam

For camera you can connect your phone as webcam

Story

Setting Up Your Ryzen AI Development Environment

Welcome, in this guide, we'll walk you through setting up a powerful development environment for our Ryzen AI-powered recycling classification project.

Step 1: Prepare Your System

Enable AMD IPU/NPU

Check IPU/NPU Status:

Open Device Manager from Windows Search.
Expand System Devices.
Look for AMD IPU Device.
Check IPU/NPU Status:Open Device Manager from Windows Search.Expand System Devices.Look for AMD IPU Device.

Enable IPU/NPU in BIOS:

Search for Advanced Startup and select Recovery Options.
Click Restart Now.
Navigate to Troubleshoot > Advanced options > UEFI Firmware Settings > Restart.
Go to Advanced > CPU Configuration.
Set IPU Control to Enabled.
Save and exit BIOS.
Enable IPU/NPU in BIOS:Search for Advanced Startup and select Recovery Options.Click Restart Now.Navigate to Troubleshoot > Advanced options > UEFI Firmware Settings > Restart.Go to Advanced > CPU Configuration.Set IPU Control to Enabled.Save and exit BIOS.

Install NPU Driver:

Download the NPU driver here.
Extract the zip file.

Open Command Prompt in admin mode and run:

.\amd_install_kipudrv.bat

Verify the installation in Device Manager under System Devices.

Install Dependencies

Visual Studio 2019:Download and install from the official website.
CMake (>= 3.26):Download from the CMake website.
Python (>= 3.9):Download from the Python website.
Anaconda/MinicondaDownload from the Anaconda website.

Anaconda/Miniconda:

Download from the Anaconda website.

Add the following to the PATH variable:

path\to\anaconda3\
path\to\anaconda3\Scripts\
path\to\anaconda3\Lib\bin\

Step 2: Install Ryzen AI Software

Download and Extract Ryzen AI SW Package:Download from this link.Extract the package.

Install Ryzen AI Software:

Open Command Prompt in admin mode.
Navigate to the extracted folder.

Run:

.\install.bat -env <env name>

Activate Conda Environment:

conda activate <env name>

Run Quick Test:

Navigate to the quick test directory and run quickest.py:

cd ryzen-ai-sw-1.1\quicktest
python quicktest.py

You should see

[Vitis AI EP] No. of Operators :  CPU  2 IPU  398 99.50%
[Vitis AI EP] No. of Subgraphs :  CPU  1 IPU  1 Actually running on IPU  1
...
Test Passed
...

Following these steps, you will have the AMD Ryzen AI engine set up on your Minisforum UM790 Pro, ready for your object detection projects. For more details, visit the official AMD Ryzen AI documentation.

Technical Deep Dive: Litter Detection AI with Faster R-CNN and RyzenAI

This article provides a detailed explanation of the litter detection AI project, focusing on the model architecture, key code components, and the integration with RyzenAI.

Dataset

TACO (Trash Annotations in Context) dataset is a collection of images annotated with various types of waste items. It is designed to aid in the development of models for waste detection, which is crucial for recycling and environmental cleanup. It comprises over 60 classes. Because I need object detection I also need annotations.

How to Download

Visit the GitHub Repository: Go to TACO Dataset GitHub.

Clone the Repository: Open a terminal and run:

git clone https://github.com/pedropro/TACO.git

Navigate to the Directory:

cd TACO

Download the Data: Run the script to download images and annotations:

python download.pt

Verify the Download: Check that the data directory contains images and annotations.json.

Model Architecture: Faster R-CNN

Mask R-CNN is an extension of Faster R-CNN that adds a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression. We'll use the maskrcnn_resnet50_fpn model from torchvision and fine-tune it on the TACO dataset.

Setting Up

We start by importing necessary libraries and setting up an argument parser to configure training or testing parameters such as epochs, dataset path, batch size, learning rate, and data split. In order to run the python file use

!python recycling_resnet.py -train --num_epochs 10 --dataset_path /path/to/dataset

Custom Dataset Class

We create a custom dataset class for TACO using torch.utils.data.Dataset. This class handles loading images and their COCO-formatted annotations, including bounding boxes, labels, and masks.

Data Transformations

We define transformation classes for preprocessing images and annotations. These include converting images to tensors, resizing them, and applying random horizontal flips for data augmentation. These transformations are combined sequentially using a custom Compose class.

Training the Model

The training function initializes the model's weights, sets the model to training mode, and updates the model's parameters using an Adam optimizer. A learning rate scheduler adjusts the learning rate based on validation loss. During each epoch, the model processes training data, computes losses, and performs backpropagation. Validation loss is computed to monitor progress, and checkpoints are saved after each epoch.

Testing and Visualization

For testing, the trained model is loaded and set to evaluation mode. It processes test data and visualizes predictions by overlaying detected bounding boxes and segmentation masks on the input images using matplotlib.

Conclusion

But unfortunantely I got RuntimeError: The size of tensor a (4) must match the size of tensor b (0) at non-singleton dimension 1 errors. I tried to solve it many many times and I couldn't solve. So I tried another way which is using yolo model.

Part:2 YOLO model

I tried different approach using yolo model. This code sets up and trains a YOLOv8 model for object detection using a custom-labelled TACO dataset. The YOLOv8 package is installed, and necessary checks are performed. The Roboflow API is used to download the TACO dataset, which includes preprocessing and augmentation steps. The dataset is split into training, validation, and testing sets, with specific augmentation parameters applied. The code creates a directory for datasets, installs the Roboflow library, and uses an API key to access and download the dataset. The data.yaml file is displayed to show class and dataset file paths. Training is initiated with the downloaded dataset, specifying 640x640 image size and 30 epochs. After training, various results such as confusion matrix, training results, and validation images are visualized. Finally, the best-trained model is validated, predictions are made on test images, and the model is exported to ONNX format for inference in a backend environment like Node.js.Use following command:

python YOLO_train.py

Now I can go to next phase quantizing.

Import Libraries and Define Paths

First, ensure you have all the required libraries installed. These include argparse, torch, numpy, onnx, onnxruntime, vai_q_onnx, torchvision, and cv2. Define the paths for your script directory, models directory, and ONNX model path.

pip install argparse torch numpy onnx onnxruntime vai_q_onnx torchvision opencv-python

To run the quantize script, use the command line:

python quantize_yolov8.py --data_dir path/to/your/dataset --model your_model_name

Replace path/to/your/dataset with the actual path to your dataset and your_model_name with the name you want to give to your quantized model

YOLOv8 Model Quantization Process

The quantization process begins by defining a class RecyclingDataset that inherits from Dataset. This class handles loading images and corresponding labels from the dataset directory, and preprocesses the images using transformations like resizing and normalization. Labels, read from text files, contain class IDs and bounding box coordinates. Next, the prepare_dataset function is defined to load and preprocess the dataset, creating a DataLoader object that provides an iterable over the dataset. This function accepts the dataset directory, batch size, and a flag indicating if the dataset is for quantization, using a different subset of data for calibration if quantization is enabled. A class YOLOCalibrationDataReader is then created, extending CalibrationDataReader to iterate over the DataLoader and provide batches of images for model calibration during quantization.

The quantize_model function performs model quantization by loading the ONNX model, checking its validity, and using the vai_q_onnx library for quantization. It takes the model name and DataLoader for calibration data, saving the quantized model to a specified output path. To use the quantized model, a load_quantized_model function initializes an inference session with the model using onnxruntime for efficient CPU inference. An argument parser function get_argsis created using argparse to handle command-line arguments such as the model name and dataset directory. Finally, the main function coordinates the entire process by parsing command-line arguments, preparing the dataset, and performing model quantization, while handling any exceptions that occur and providing detailed error messages for debugging.

Conclusion

It does indeed work but uses lot of ram and crashes. But It creates following command succesfully.

Quantizing model...
Quantizing yolo...
[VAI_Q_ONNX_INFO]: Time information:
2024-08-01 08:49:52.030188
[VAI_Q_ONNX_INFO]: OS and CPU information:
system --- Windows
node --- Mert
release --- 10
version --- 10.0.22635
machine --- AMD64
processor --- AMD64 Family 25 Model 116 Stepping 1, AuthenticAMD
[VAI_Q_ONNX_INFO]: Tools version information:
python --- 3.9.18
onnx --- 1.16.1
onnxruntime --- 1.15.1
vai_q_onnx --- 1.16.0+69bc4f2
[VAI_Q_ONNX_INFO]: Quantized Configuration information:
model_input --- C:\Users\merte\OneDrive\Desktop\Resnet-v2\models\best.onnx
model_output --- C:\Users\merte\OneDrive\Desktop\Resnet-v2\models\yolo_recycling_detection.qdq.U8S8.onnx
calibration_data_reader --- <__main__.YOLOCalibrationDataReader object at 0x0000015E42792700>
quant_format --- QDQ
input_nodes --- []
output_nodes --- []
op_types_to_quantize --- []
random_data_reader_input_shape --- []
per_channel --- False
reduce_range --- False
activation_type --- QUInt8
weight_type --- QInt8
nodes_to_quantize --- []
nodes_to_exclude --- []
optimize_model --- True
use_external_data_format --- False
calibrate_method --- PowerOfTwoMethod.MinMSE
execution_providers --- ['CPUExecutionProvider']
enable_ipu_cnn --- True
enable_ipu_transformer --- False
debug_mode --- False
convert_fp16_to_fp32 --- False
convert_nchw_to_nhwc --- False
include_cle --- False
include_fast_ft --- False
extra_options --- {'ActivationSymmetric': True}
INFO:vai_q_onnx.quant_utils:The input ONNX model C:\Users\merte\OneDrive\Desktop\Resnet-v2\models\best.onnx can create InferenceSession successfully
INFO:vai_q_onnx.optimize:Found Split node /model.8/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.12/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.15/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.18/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.21/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.22/Split. Replacing with Slice.
INFO:vai_q_onnx.optimize:Found Split node /model.22/Split_1. Replacing with Slice.
INFO:vai_q_onnx.quantize:Start calibration...
INFO:vai_q_onnx.quantize:Start collecting data, runtime depends on your model size and the number of calibration dataset.

Finally for live camera use following command:(If you don't have webcam you can use DroidCam)

python live_camera_recycling.py

Future Improvements

1. Implement more advanced data augmentation techniques (rotations, color jittering, etc.).

2. Experiment with different backbone networks (e.g., EfficientNet, ResNeXt).

3. Implement feature pyramid network (FPN) for better handling of objects at different scales.

4. Explore other object detection architectures like YOLO or SSD for comparison.

import os
import argparse
import torch
import torchvision
from torchvision.models.detection import maskrcnn_resnet50_fpn
import torchvision.transforms.functional as F
from pycocotools.coco import COCO
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
import random

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-train", action='store_true')
    parser.add_argument("--num_epochs", type=int, default=10)
    parser.add_argument("--dataset_path", type=str, default="TACO")
    parser.add_argument("--batch_size", type=int, default=2)
    parser.add_argument("--learning_rate", type=float, default=0.00001)
    parser.add_argument("--split_number", type=int, default=0)
    args = parser.parse_args()
    return args

class TACODataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None, split='train', split_number=0):
        self.root = root
        self.transforms = transforms
        self.split = split
        self.split_number = split_number
        
        annotation_file = os.path.join(root, f'annotations_{split_number}_{split}.json')
        if not os.path.exists(annotation_file):
            raise FileNotFoundError(f"Annotation file not found at {annotation_file}")
        
        self.coco = COCO(annotation_file)
        self.ids = list(self.coco.imgs.keys())

        self.cat_ids = self.coco.getCatIds()
        self.categories = {cat['id']: cat['name'] for cat in self.coco.loadCats(self.cat_ids)}

    def __getitem__(self, index):
        img_id = self.ids[index]
        ann_ids = self.coco.getAnnIds(imgIds=img_id)
        coco_annotation = self.coco.loadAnns(ann_ids)
        
        path = self.coco.loadImgs(img_id)[0]['file_name']
        img_path = os.path.join(self.root, path)

        if not os.path.exists(img_path):
            raise FileNotFoundError(f"Image file not found: {img_path}")

        img = Image.open(img_path).convert("RGB")
        
        num_objs = len(coco_annotation)
        if num_objs == 0:
            return None

        boxes = []
        masks = []
        labels = []
        
        img_width, img_height = img.size

        for i in range(num_objs):
            xmin = coco_annotation[i]['bbox'][0]
            ymin = coco_annotation[i]['bbox'][1]
            xmax = xmin + coco_annotation[i]['bbox'][2]
            ymax = ymin + coco_annotation[i]['bbox'][3]
            boxes.append([xmin, ymin, xmax, ymax])
            labels.append(coco_annotation[i]['category_id'])
            mask = self.coco.annToMask(coco_annotation[i])
            mask = Image.fromarray(mask).resize((img_width, img_height), Image.NEAREST)
            masks.append(np.array(mask))

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        masks = torch.as_tensor(np.array(masks), dtype=torch.uint8)
        image_id = torch.tensor([img_id])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.ids)

class Compose:
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

class ToTensor(object):
    def __call__(self, image, target):
        image = F.to_tensor(image)
        return image, target

class RandomHorizontalFlip(object):
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
            if "masks" in target:
                target["masks"] = target["masks"].flip(-1)
        return image, target

class Resize(object):
    def __init__(self, size):
        self.size = size

    def __call__(self, image, target):
        image = F.resize(image, self.size)
        if "masks" in target:
            target["masks"] = F.resize(target["masks"], self.size)
        return image, target

def get_transform(train):
    transforms = []
    transforms.append(ToTensor())
    transforms.append(Resize((800, 800)))
    if train:
        transforms.append(RandomHorizontalFlip(0.5))
    return Compose(transforms)

def collate_fn(batch):
    batch = [b for b in batch if b is not None]
    return tuple(zip(*batch))

def init_weights(m):
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

def train_model(model, data_loader_train, data_loader_val, device, num_epochs, lr):
    model.to(device)
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.Adam(params, lr=lr)
    scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=3)

    for epoch in range(num_epochs):
        model.train()
        total_loss = 0
        for images, targets in data_loader_train:
            images = list(image.to(device) for image in images)
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            # Debug: Print shapes of inputs and targets
            print(f"Images shape: {[img.shape for img in images]}")
            for target in targets:
                print(f"Target keys: {target.keys()}")
                for key, value in target.items():
                    print(f"Target {key} shape: {value.shape}")

            loss_dict = model(images, targets)

            if isinstance(loss_dict, dict):
                losses = sum(loss for loss in loss_dict.values())
            else:
                print("Loss dict is a list, check its content.")
                for i, loss in enumerate(loss_dict):
                    print(f"Loss {i} keys: {loss.keys()}")
                    for key, value in loss.items():
                        print(f"Loss {i} {key} shape: {value.shape}")
                losses = sum([sum(loss.values()) for loss in loss_dict])

            optimizer.zero_grad()
            losses.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            total_loss += losses.item()

        avg_loss = total_loss / len(data_loader_train)
        print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {avg_loss:.4f}")

        model.eval()
        val_loss = 0
        with torch.no_grad():
            for images, targets in data_loader_val:
                images = list(image.to(device) for image in images)
                targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

                # Debug: Print shapes of inputs and targets
                print(f"Validation Images shape: {[img.shape for img in images]}")
                for target in targets:
                    print(f"Validation Target keys: {target.keys()}")
                    for key, value in target.items():
                        print(f"Validation Target {key} shape: {value.shape}")

                loss_dict = model(images, targets)

                if isinstance(loss_dict, dict):
                    losses = sum(loss for loss in loss_dict.values())
                else:
                    print("Loss dict is a list, check its content.")
                    for i, loss in enumerate(loss_dict):
                        print(f"Validation Loss {i} keys: {loss.keys()}")
                        for key, value in loss.items():
                            print(f"Validation Loss {i} {key} shape: {value.shape}")
                    losses = sum([sum(loss.values()) for loss in loss_dict])

                val_loss += losses.item()

        avg_val_loss = val_loss / len(data_loader_val)
        print(f"Epoch {epoch+1}/{num_epochs}, Validation Loss: {avg_val_loss:.4f}")
        scheduler.step(avg_val_loss)

        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': avg_loss,
        }
        torch.save(checkpoint, f"model/maskrcnn_taco_checkpoint_epoch_{epoch+1}.pt")

    print("Training complete")
    torch.save(model.state_dict(), "model/maskrcnn_taco_litter_detection_final.pt")

def visualize_prediction(image, prediction, coco):
    fig, ax = plt.subplots(1, figsize=(12, 8))
    image = image.cpu().permute(1, 2, 0).numpy()
    ax.imshow(image)
    
    for box, label, score in zip(prediction['boxes'], prediction['labels'], prediction['scores']):
        if score > 0.5:
            box = box.cpu().numpy()
            rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], 
                                     linewidth=2, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            class_name = coco.loadCats(label.cpu().item())[0]['name']
            ax.text(box[0], box[1], f"{class_name}: {score:.2f}", color='white', 
                    bbox=dict(facecolor='red', alpha=0.5))
    
    plt.show()

def test_model(model, data_loader, device, coco):
    model.to(device)
    model.eval()
    
    for images, _ in data_loader:
        images = list(img.to(device) for img in images)
        
        with torch.no_grad():
            predictions = model(images)
        
        for img, prediction in zip(images, predictions):
            visualize_prediction(img, prediction, coco)

def main():
    args = get_args()
    
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    
    dataset_train = TACODataset(root=os.path.join(args.dataset_path, 'data'),
                                transforms=get_transform(train=True),
                                split='train',
                                split_number=args.split_number)
    
    dataset_val = TACODataset(root=os.path.join(args.dataset_path, 'data'),
                              transforms=get_transform(train=False),
                              split='val',
                              split_number=args.split_number)
    
    dataset_test = TACODataset(root=os.path.join(args.dataset_path, 'data'),
                               transforms=get_transform(train=False),
                               split='test',
                               split_number=args.split_number)
    
    data_loader_train = DataLoader(dataset_train, batch_size=args.batch_size, shuffle=True, num_workers=4,
                                   collate_fn=collate_fn)
    
    data_loader_val = DataLoader(dataset_val, batch_size=1, shuffle=False, num_workers=4,
                                 collate_fn=collate_fn)
    
    data_loader_test = DataLoader(dataset_test, batch_size=1, shuffle=False, num_workers=4,
                                  collate_fn=collate_fn)
    
    coco = dataset_train.coco

    num_classes = len(dataset_train.categories) + 1  # +1 for background class

    if args.train:
        model = maskrcnn_resnet50_fpn(pretrained=True)
        in_features = model.roi_heads.box_predictor.cls_score.in_features
        model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)
        model.roi_heads.mask_predictor = torchvision.models.detection.mask_rcnn.MaskRCNNPredictor(256, 256, num_classes)
        
        model.roi_heads.box_predictor.apply(init_weights)
        model.roi_heads.mask_predictor.apply(init_weights)
        
        print(f"Number of classes: {num_classes}")
        print(f"Model structure: {model}")
        
        train_model(model, data_loader_train, data_loader_val, device, args.num_epochs, args.learning_rate)
    else:
        model = maskrcnn_resnet50_fpn(num_classes=num_classes)
        model.load_state_dict(torch.load("model/maskrcnn_taco_litter_detection_final.pt"))
        test_model(model, data_loader_test, device, coco)

if __name__ == "__main__":
    main()

import os
HOME = os.getcwd()
print(HOME)
!pip install ultralytics==8.0.20

from IPython import display
display.clear_output()

import ultralytics
ultralytics.checks()
from ultralytics import YOLO
from IPython.display import display, Image


!mkdir {HOME}/datasets
%cd {HOME}/datasets

!pip install roboflow --quiet

from roboflow import Roboflow
rf = Roboflow(api_key="fR5rAAgUtxpvu1aEzWsQ")
project = rf.workspace("roboflow-universe-projects").project("taco-object-detection-kcxyn")
dataset = project.version(2).download("yolov8")

%cat {dataset.location}/data.yaml

%cd {HOME}

!yolo task=detect mode=train model=yolov8m.pt data={dataset.location}/data.yaml epochs=30 imgsz=640 plots=True

!ls {HOME}/runs/detect/train/

%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train/confusion_matrix.png', width=600)

%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train/results.png', width=600)

%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train/val_batch0_pred.jpg', width=600)

%cd {HOME}

!yolo task=detect mode=val model={HOME}/runs/detect/train/weights/best.pt data={dataset.location}/data.yaml


%cd {HOME}
!yolo task=detect mode=predict model={HOME}/runs/detect/train/weights/best.pt conf=0.25 source={dataset.location}/test/images save=True


import glob
from IPython.display import Image, display

for image_path in glob.glob(f'{HOME}/runs/detect/predict/*.jpg')[:3]:
      display(Image(filename=image_path, width=600))
      print("\n")
%pwd
%ls

from ultralytics import YOLO

model = YOLO('runs/detect/train/weights/best.pt')

model.export(format='onnx')

import argparse
import torch
import numpy as np
import onnx
import onnxruntime as ort
import vai_q_onnx
from onnxruntime.quantization import QuantFormat, QuantType, CalibrationDataReader
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import cv2
from pathlib import Path
import os

# Define the paths
script_dir = os.path.dirname(os.path.abspath(__file__))
models_dir = os.path.join(script_dir, 'models')
onnx_model_path = os.path.join(models_dir, 'best.onnx')

class RecyclingDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        self.data_dir = Path(data_dir)
        self.image_dir = self.data_dir / 'images'
        self.label_dir = self.data_dir / 'labels'
        self.transform = transform

        self.image_files = list(self.image_dir.glob('*.jpg'))  # Adjust file extension if needed
        print(f"Found {len(self.image_files)} images in {self.image_dir}")

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        img_path = str(self.image_files[idx])
        label_path = str(self.label_dir / (self.image_files[idx].stem + '.txt'))

        image = cv2.imread(img_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        if self.transform:
            image = self.transform(image)
        
        # Read labels
        labels = self.read_labels(label_path, img_path)
        
        return image, labels
    
    def read_labels(self, label_path, img_path):
        if os.path.exists(label_path):
            try:
                with open(label_path, 'r') as f:
                    lines = f.readlines()
                    labels = []
                    for line in lines:
                        parts = list(map(float, line.strip().split()))
                        if len(parts) >= 6:
                            class_id = int(parts[0])
                            polygon = np.array(parts[1:]).reshape(-1, 2)
                            x_min, y_min = polygon.min(axis=0)
                            x_max, y_max = polygon.max(axis=0)
                            x_center = (x_min + x_max) / 2
                            y_center = (y_min + y_max) / 2
                            width = x_max - x_min
                            height = y_max - y_min
                            labels.append([class_id, x_center, y_center, width, height])
                        else:
                            print(f"Warning: Invalid label format in file {label_path}")
                    labels = np.array(labels)
            except Exception as e:
                print(f"Error reading label file {label_path}: {str(e)}")
                labels = np.zeros((1, 5))  # Placeholder label
        else:
            print(f"Warning: Label file not found for {img_path}")
            labels = np.zeros((1, 5))  # Placeholder label
        
        return labels

def prepare_dataset(data_dir, batch_size=1, quantization=False):
    transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.Resize((640, 640)),  # Adjust size as needed
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    actual_batch_size = 1 if quantization else batch_size
    
    # Use a smaller subset of the validation data for calibration
    subset = 'valid' if quantization else 'train'
    
    dataset_path = os.path.join(data_dir, subset)
    print(f"Loading dataset from: {dataset_path}")
    
    dataset = RecyclingDataset(dataset_path, transform=transform)
    
    if len(dataset) == 0:
        raise ValueError(f"No images found in {os.path.join(dataset_path, 'images')}")
    
    dataloader = DataLoader(dataset, batch_size=actual_batch_size, shuffle=not quantization)
    
    print(f"Created dataloader with {len(dataset)} images")
    print(f"Image directory: {dataset.image_dir}")
    print(f"Label directory: {dataset.label_dir}")
    
    # Print first few file names
    print("First few image files:")
    for img_file in dataset.image_files[:5]:
        print(f"  {img_file}")
    
    return dataloader

class YOLOCalibrationDataReader(CalibrationDataReader):
    def __init__(self, data_loader):
        super().__init__()
        self.data_loader = data_loader
        self.iterator = iter(self.data_loader)

    def get_next(self) -> dict:
        try:
            images, _ = next(self.iterator)
            return {"images": images[0].unsqueeze(0).numpy()}
        except StopIteration:
            return None

def yolo_calibration_reader(data_loader):
    return YOLOCalibrationDataReader(data_loader)

def quantize(quantize_loader, model_name):
    print(f"Quantizing {model_name}...")
    onnx_model_path = os.path.join(models_dir, 'best.onnx')
    onnx_model = onnx.load(onnx_model_path)
    onnx.checker.check_model(onnx_model)
    
    input_model_path = onnx_model_path
    output_model_path = os.path.join(models_dir, f"{model_name}_recycling_detection.qdq.U8S8.onnx")
    
    data_reader = yolo_calibration_reader(quantize_loader)
    
    try:
        vai_q_onnx.quantize_static(
            input_model_path,
            output_model_path,
            data_reader,
            quant_format=QuantFormat.QDQ,
            calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
            activation_type=QuantType.QUInt8,
            weight_type=QuantType.QInt8,
            enable_ipu_cnn=True,
            extra_options={'ActivationSymmetric': True}
        )
        print(f"Quantized Model Saved at {output_model_path}")
    except Exception as e:
        print(f"Error during quantization: {str(e)}")
        # Add more debug information
        print("Model input shape:", onnx_model.graph.input[0].type.tensor_type.shape)
        print("First batch from data_reader:")
        first_batch = data_reader.get_next()
        if first_batch:
            print("Shape:", first_batch['images'].shape)
            print("Type:", first_batch['images'].dtype)
        else:
            print("No data from data_reader")

def load_quantized_model(model_path):
    session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
    return session

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-model", type=str, default='yolo')
    parser.add_argument("--data_dir", type=str, required=True, help="Path to the dataset directory")
    args = parser.parse_args()
    return args

def main():
    args = get_args()
    
    print(f"Data directory: {args.data_dir}")
    print(f"Model name: {args.model}")
    
    try:
        print("Preparing quantization dataset...")
        quantize_loader = prepare_dataset(args.data_dir, quantization=True)
        print("Quantizing model...")
        quantize(quantize_loader, args.model)
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    main()

import cv2
import numpy as np
import onnx
import onnxruntime as ort
from pathlib import Path

# Load the quantized YOLO model
quantized_model_path = r'C:\Users\merte\OneDrive\Desktop\Resnet-v2\models\quantized_model.onnx'
model = onnx.load(quantized_model_path)

# Set up ONNX Runtime session
providers = ['CPUExecutionProvider']
provider_options = [{}]
session = ort.InferenceSession(model.SerializeToString(), providers=providers,
                               provider_options=provider_options)

# Class names for TACO dataset (59 classes)
class_names = ['Aerosol', 'Aluminium blister pack', 'Aluminium foil', 'Battery', 'Broken glass', 
               'Carded blister pack', 'Cigarette', 'Clear plastic bottle', 'Corrugated carton', 
               'Crisp packet', 'Disposable food container', 'Disposable plastic cup', 'Drink can', 
               'Drink carton', 'Egg carton', 'Foam cup', 'Foam food container', 'Food Can', 
               'Food waste', 'Garbage bag', 'Glass bottle', 'Glass cup', 'Glass jar', 
               'Magazine paper', 'Meal carton', 'Metal bottle cap', 'Metal lid', 'Normal paper', 
               'Other carton', 'Other plastic bottle', 'Other plastic container', 'Other plastic cup', 
               'Other plastic wrapper', 'Other plastic', 'Paper bag', 'Paper cup', 'Paper straw', 
               'Pizza box', 'Plastic bottle cap', 'Plastic film', 'Plastic glooves', 'Plastic lid', 
               'Plastic straw', 'Plastic utensils', 'Polypropylene bag', 'Pop tab', 'Rope & strings', 
               'Scrap metal', 'Shoe', 'Single-use carrier bag', 'Six pack rings', 'Spread tub', 
               'Squeezable tube', 'Styrofoam piece', 'Tissues', 'Toilet tube', 'Tupperware', 
               'Unlabeled litter', 'Wrapping paper']

def preprocess_image(image, input_size=(640, 640)):
    original_height, original_width = image.shape[:2]
    
    # Resize and pad image
    ratio = min(input_size[0] / original_width, input_size[1] / original_height)
    new_size = (int(original_width * ratio), int(original_height * ratio))
    resized = cv2.resize(image, new_size, interpolation=cv2.INTER_LINEAR)
    
    padded = np.full((input_size[0], input_size[1], 3), 114, dtype=np.uint8)
    padded[:new_size[1], :new_size[0]] = resized
    
    # Normalize and change to CHW format
    padded = padded.astype(np.float32) / 255.0
    padded = padded.transpose(2, 0, 1)
    
    return np.expand_dims(padded, axis=0), (original_height, original_width), new_size
def postprocess(output, orig_shape, new_size, conf_threshold=0.5, iou_threshold=0.45):
    predictions = np.squeeze(output[0])
    
    # Assuming the output is in the format [x, y, w, h, conf, class_scores]
    num_classes = min(predictions.shape[1] - 5, len(class_names))
    
    # Apply sigmoid to confidence scores and class scores
    scores = 1 / (1 + np.exp(-predictions[:, 4]))
    class_scores = 1 / (1 + np.exp(-predictions[:, 5:5+num_classes]))
    
    # Get predicted classes
    class_ids = np.argmax(class_scores, axis=1)
    
    # Filter based on confidence threshold and valid class range
    mask = (scores > conf_threshold) & (class_ids < num_classes)
    boxes = predictions[mask, :4]
    scores = scores[mask]
    class_ids = class_ids[mask]
    
    # Convert boxes from [x, y, w, h] to [x1, y1, x2, y2]
    boxes[:, 2] = boxes[:, 0] + boxes[:, 2]
    boxes[:, 3] = boxes[:, 1] + boxes[:, 3]
    
    # Apply NMS
    indices = cv2.dnn.NMSBoxes(boxes, scores, conf_threshold, iou_threshold)
    
    if len(indices) > 0:
        indices = indices.flatten()
        boxes = boxes[indices]
        scores = scores[indices]
        class_ids = class_ids[indices]
        
        # Scale boxes back to original image
        input_height, input_width = new_size
        orig_height, orig_width = orig_shape
        
        scale = min(input_width / orig_width, input_height / orig_height)
        offset_x = (input_width - orig_width * scale) / 2
        offset_y = (input_height - orig_height * scale) / 2
        
        boxes[:, [0, 2]] = (boxes[:, [0, 2]] - offset_x) / scale
        boxes[:, [1, 3]] = (boxes[:, [1, 3]] - offset_y) / scale
        
        # Additional filtering based on box size (optional)
        valid_detections = (boxes[:, 2] - boxes[:, 0]) > 20 & (boxes[:, 3] - boxes[:, 1]) > 20
        boxes = boxes[valid_detections]
        scores = scores[valid_detections]
        class_ids = class_ids[valid_detections]
        
        return boxes, scores, class_ids
    else:
        return [], [], []

# Update the draw_detections function to include more information:
def draw_detections(frame, boxes, scores, class_ids):
    for box, score, class_id in zip(boxes, scores, class_ids):
        x1, y1, x2, y2 = box.astype(int)
        
        # Debug print
        print(f"class_id: {class_id}, score: {score:.4f}, box: {box}")
        
        # Check if class_id is valid
        if 0 <= class_id < len(class_names):
            label = f'{class_names[class_id]}: {score:.2f}'
        else:
            label = f'Unknown ({class_id}): {score:.2f}'
        
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        
        # Draw the label background
        label_size, _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
        cv2.rectangle(frame, (x1, y1 - label_size[1] - 10), (x1 + label_size[0], y1), (0, 255, 0), -1)
        
        # Draw the label text
        cv2.putText(frame, label, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)

def main():
    cap = cv2.VideoCapture(0)  # Open the default camera 
    
    while True:
        ret, frame = cap.read()
        if not ret:
            print("Failed to capture frame")
            break
    while True:    
        # Preprocess the frame
        input_data, orig_shape, new_size = preprocess_image(frame)
        
        # Run inference
        try:
            outputs = session.run(None, {'images': input_data})
            
            print(f"Model output shape: {outputs[0].shape}")
            print(f"Model output type: {outputs[0].dtype}")
            print(f"Model output min: {outputs[0].min()}, max: {outputs[0].max()}")
            
            boxes, scores, class_ids = postprocess(outputs[0], orig_shape, new_size)
            
            print(f"Number of detections: {len(boxes)}")
            print(f"Class IDs: {class_ids}")
            print(f"Scores: {scores}")
            print(f"Boxes: {boxes}")
            
            draw_detections(frame, boxes, scores, class_ids)
        
        except Exception as e:
            print(f"Error during inference or postprocessing: {e}")
              
        # Display the frame
        cv2.imshow('Recycling Classification', frame)
        
        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # Release the capture and close windows
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Credits

Mert Erbak

1 project • 0 followers

A forward-thinking and dynamic student with a deep-rooted passion for Artificial Intelligence, Finance, and future trends.

Thanks to babritb-bot.

Real-time Object Detection and Classification for Recycling

Things used in this project

Hardware components

Software apps and online services

Story

Setting Up Your Ryzen AI Development Environment

Step 1: Prepare Your System

Enable AMD IPU/NPU

Install Dependencies

Step 2: Install Ryzen AI Software

Technical Deep Dive: Litter Detection AI with Faster R-CNN and RyzenAI

Dataset

How to Download

Model Architecture: Faster R-CNN

Setting Up

Custom Dataset Class

Training the Model

Testing and Visualization

Conclusion

Part:2 YOLO model

Import Libraries and Define Paths

YOLOv8 Model Quantization Process

Conclusion

Future Improvements

Code

recycling_resnet.py

YOLO_train.py

recycling_quantize.py

live_camera_recycling.py

Credits

Mert Erbak

Comments

Embed the widget on your own site

Real-time Object Detection and Classification for Recycling

Real-time Object Detection and Classification for Recycling

Things used in this project

Hardware components

Software apps and online services

Story

Setting Up Your Ryzen AI Development Environment

Step 1: Prepare Your System

Enable AMD IPU/NPU

Install Dependencies

Step 2: Install Ryzen AI Software

Technical Deep Dive: Litter Detection AI with Faster R-CNN and RyzenAI

Dataset

How to Download

Model Architecture: Faster R-CNN

Setting Up

Custom Dataset Class

Training the Model

Testing and Visualization

Conclusion

Part:2 YOLO model

Import Libraries and Define Paths

YOLOv8 Model Quantization Process

Conclusion

Future Improvements

Code

recycling_resnet.py

YOLO_train.py

recycling_quantize.py

live_camera_recycling.py

Credits

Mert Erbak

Comments

Related channels and tags