Background
Project Summary
Step One: Assemble Drone and Setup NavQ
Step Two: Download PyTorch and Stanford Drone Dataset
Step Three: Train RetinaNet on SDD
Step Four: Gather Validation Data
Step Five: Live Feed Detection
Acknowledgements

Created February 11, 2021 © MIT

Aerial Social Distancing Monitoring with Drones

As colleges and universities begin to reopen, monitoring social distancing during outdoor events is required.

IntermediateFull instructions providedOver 1 day184

Bonus Prizes

NXP HoverGames Challenge 2: Help Drones, Help Others During Pandemics

Aerial Social Distancing Monitoring with Drones

Things used in this project

Hardware components

Turnigy 5000mAh 3S 40C Lipo Pack w/XT-90

NXP KIT-RDDRONEK66;

NXP 8MMNavQ.

Software apps and online services

Story

Background:

With universities and other institutions starting to return to normal as vaccines and other measures are put in place, social distancing for sporting events and other outdoor gatherings will be required. With the HoverGames drone competition, the provided drone combined with on board companion computer aided to create a solution for monitoring crowds and determining how closely social distancing measures are working. With advances in AI and computer vision, the processing of these complex scenes gathered by drones is allowing for the detection of people and distances. These advances allowed for a unique application of drones and aerial imaging for determining social distancing measures.

Project Summary:

With the NXP HoverGames drone kit and the NavQ 8MMNavQ provided with the competition, an application was created that utilized machine learning to perform object detection and classification from aerial images. First, the drone kit was assembled and the NavQ was setup. From there, the PyTorch implementation of RetinaNet was trained on the Stanford Drone Dataset (SDD). From this small-scale tests were conducted in adherence with social distancing guidelines and university aviation requirements. Videos from the test were used for validation of the network. The process allowed for the HoverGames drone kit to be used for gathering the data and creating a proof-of-concept social distancing monitoring system.

Overview of the Aerial Detection Project

Step One: Assemble Drone and Setup NavQ

For assembling the HoverGames Drone kit please refer to the guide provided by NXP here. For Setting up the NavQ computer refer to the guide here

Step Two: Download PyTorch and Stanford Drone Dataset

For the object detection and person recognition, please refer to the PyTorch website for installation instructions found here. To download the test, training, and validation sets for the Stanford Drone Dataset (SDD), please refer to this link. The dataset was created by Priya Dwivedi for a Keras implementation of Retinanet on the SDD. The github repository for that project is found here.

Step Three: Train RetinaNet on SDD

The purpose behind selecting RetinaNet for detection stems from a combination of others successfully training the network on the SDD coupled with the ability of the network to perform the feature recognition on the complex scenes would miss. Further information about the RetinaNet architecture can be found here. The training script is attached included in with this guide. A handful of modifications were made to the RetinaNet to improve model accuracy on the SDD. First, the minimum bounding box was decreased to 16x16 pixels compared to the original size of 32x32 pixels. Likewise, the maximum bounding box sizer was decreased from 512x512pixels to 256x256pixels. These changes were implemented with the following lines:

model= models.detection.retinanet_resnet50_fpn(num_classes=7, pretrained=False, pretrained_backbone=True)
#! Generate smaller anchors -- copied directly from model setup
anchor_sizes = tuple((x, int(x * 2 ** (1.0 / 3)), int(x * 2 ** (2.0 / 3))) for x in [16, 32, 64, 128, 256])
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)
#! Update the anchor generator inside the model
model.anchor_generator = anchor_generator

These changes were determined from researching other implementations of RetinaNet on different datasets. For future changes to the RetinaNet Architecture, the PyTorch source code provides documentation on the different parameters of the network and how they could be changed. Furthermore, depending on the accuracy and computational resources available, the model backbone can be decreased from ResNet-50 to ResNet-32 or ResNet-18 to decrease the number of trainable parameters. This change can be implemented with the following lines of code:

model.backbone = resnet_fpn_backbone('resnet18', pretrained=True, returned_layers=[2, 3, 4], trainable_layers=0, extra_blocks=LastLevelP6P7(256, 256))

Finally, if finding the number of trainable parameters is required, then the following addition can be made to the training code:

pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("trainable: ", pytorch_total_params)

Training the RetinaNet on the SDD required creating a custom dataloader. The image paths, bounding boxes, and labels were imported from the respective .csv files. The implementation of the custom dataloader is included in the training code. Please refer to those comments for a detailed description of what the lines perform. Running the training script will train the model for a specified number of epochs and batch sizes. Training time will vary depending on computational resources as the dataset provided is very large and memory requirements will vary between computers. Depending on your computational resources training could take a couple of days to a week to complete to reach a desirable accuracy.

Step Four: Gather Validation Data

Once the model training has been completed, the time for validating and testing the model has arrived. The drone can be programmed in QGroundControl to follow a specific path and hover at a given altitude. From the QGroundControl program, the estimated flight time can be determined. With this estimated flight time, a python script can be run on the NavQ with the camera aimed downwards as shown in this image:

Mounting of Google Coral Camera

For connecting to the NavQ board, create a mobile hotspot on either a laptop or mobile phone and connect the NavQ and a laptop to it. SSH into the NavQ to execute the python script. It is recommened to use tmux as the terminal since it allows for processes to continue running even if the connection is broken or terminated. Instructions for using tmux can be found here.

Create a mission in QGroundControl following this guide and upload the mission to the drone. I achieved the best results when hovering between 15-30m in the air. When flying the drone, please check local FAA regulations for your area. When the script finishes, the file will saved to the NavQ. It is recommended to create a github repository to aid in getting the videos from the NavQ for analysis.

Once you have the video recorded with people in the scene, run the following python code to save each frame as a separate image for validating the image network:

import cv2
path = "path_to_video_file"
dir = "directory_to_save_images/"
vidcap = cv2.VideoCapture(path)
success,image = vidcap.read()
count = 0
while success:
    cv2.imwrite(dir+"frame%d.jpg" % count, image)     # save frame as JPEG file
    success,image = vidcap.read()
    print('Read a new frame: ', success)
    count += 1

The python file will take the video and convert each frame into a separate file.

Performing validation occurs in the included Jupyter Notebook. Running the notebook will sample the images from your test. Below is a sample image from a test on a personal validation image set:

People in Red boxes with scores from the RetinaNet above

Based the accuracy that you achieve, you may need to adjust either the network parameters, number of epochs or other possibilities.

Step Five: Live Feed Detection

The final step in the project is to establish a live feed between the drone and a computer performing the analysis. From here, the distances between people can be calculated from the height of the drone and the camera characteristics. Due to an unforeseen weather events over the past three weeks, the final stage of the project which involved implementing the live stream and detection of social distancing was not fully implement. The section of this project will be updated weather permitting over the following weeks to include the necessary documentation to achieve real-time person detection and social distancing. The equation for determining distancing is included in the final cell of the jupyter notebook for further implementation of distancing. The cover image was originally used as a sample image to determine the distances between people and verify the equation functioned as expected.

Acknowledgements:

I would like to thank Liberty University and the Center for Research and Scholarship for support in this project. I would also like to thank my research advisor and TRACER lab director, Dr. Medina, for his guidance in the project.

# Package Imports
import numpy as np
import cv2 as cv
import time
# SEtup Capture object
cap = cv.VideoCapture('v4l2src ! video/x-raw,width=640,height=480 ! decodebin ! videoconvert ! appsink', cv.CAP_GSTREAMER)

# Define the codec and create VideoWriter object
fourcc = cv.VideoWriter_fourcc(*'XVID')
out = cv.VideoWriter('flight.avi', fourcc, 15.0, (640,  480))

# Define Flight Time in seconds
dur = 60*1.5

# Start Recording and Timer
print("Starting recording")
start = time.time()
while cap.isOpened() and time.time()-start < dur:
    ret, frame = cap.read()
    if not ret:
        print("Can't receive frame (stream end?). Exiting ...")
        break
    out.write(frame)
cap.release()
out.release()

# # Package Imports
import torch
import torchvision as tv
import pandas as pd
import os
import numpy as np
from torchvision import models, io, transforms, utils
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import time
import copy
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torchvision.models.detection.anchor_utils import AnchorGenerator
from torchvision.ops.feature_pyramid_network import LastLevelP6P7
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone

# Determine if CUDA device is available
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Define Custom Dataset
class StanfordDroneDataset():
    # Create initialization object
    def __init__(self, transform, type, model_dir):
        #* Import CSV
        self.model_dir = model_dir
        # IMport Annotations and bounding boxes
        filename = self.model_dir+'model_data/'+type+'_annotations.csv'
        data_df = pd.read_csv(filename, header=None, names=['path', 'x1', 'y1', 'x2', 'y2', 'category'])
        # Import labels
        label_file = self.model_dir+'model_data/labels.csv'
        labels_df = pd.read_csv(label_file, header = None, names = ['category', 'label'])
        labels_df = labels_df.set_index('category')
        # Input the full paths
        full_paths = data_df['path'].to_list()
        full_boxes = torch.tensor(data_df.loc[:, 'x1':'y2'].values)
        labels_list = list(map(lambda x: labels_df.loc[x,:].values[0], data_df['category']))
        full_labels = torch.tensor(labels_list).to(torch.long)
        self.paths = []
        self.targets = []
        # Create a dictionary with the information required for training the retinanet
        for i in range(len(full_paths)-1):
            target = []
            boxes = []
            labels = []
            while (full_paths[i] == full_paths[i+1]):
                dictionary = {"boxes": full_boxes[i], "labels": full_labels[i]}
                target.append(dictionary)
                i+=1
                if (i >= (len(full_paths)-1)): 
                    break        
            if len(target)!=0:
                self.targets.append(target)
                self.paths.append(full_paths[i-1])
        self.transform = transform
    
    # Define getitem 
    def __getitem__(self, idx):
        # Define Full path
        img_path = self.model_dir+self.paths[idx]
        
        # Read image
        img_pil = Image.open(img_path).convert("RGB")
        # Transform Image
        img = self.transform(img_pil)
        _,h,w = img.shape
        target = self.targets[idx]
        return img, target
    # Get length of dataset
    def __len__(self):
        return len(self.paths)

# Define transformation that needs to be applied to model
normalize = transforms.Compose([
                                transforms.ToTensor(), 
                                ])

phases = ['train']
#* Import Datasets
dataset = {x: StanfordDroneDataset(normalize, x, r'/home/carson/research/drone/aerial_image_recognition/') for x in phases}
#* Create DataLoader
dataloader = {x: torch.utils.data.DataLoader(dataset[x], batch_size=1, shuffle = True) for x in phases}
#* Get dataset sizes
dataset_sizes = {x: len(dataset[x]) for x in phases}
print('done loading data')

# Setup Model
model= models.detection.retinanet_resnet50_fpn(num_classes=7, pretrained=False, pretrained_backbone=True)
#! Generate smaller anchors -- copied directly from model setup
anchor_sizes = tuple((x, int(x * 2 ** (1.0 / 3)), int(x * 2 ** (2.0 / 3))) for x in [16, 32, 64, 128, 256])
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)
#! Update the anchor generator inside the model
model.anchor_generator = anchor_generator
model.backbone = resnet_fpn_backbone('resnet18', pretrained=True, returned_layers=[2, 3, 4], trainable_layers=0, extra_blocks=LastLevelP6P7(256, 256))
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("trainable parameters: ", pytorch_total_params)
print("MODEL LOADED")

# Create optimizer and learning rate scheduler
optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')

# Define Hyperparameters
num_epochs = 10
batch_size = 1

###############3
# Add personal directory location for saving file
save_dir = ""

runnning_loss = []
phase = 'train'

# Begin Training model
for epoch in range(0,num_epochs):
    print('epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)
    set_loss = torch.Tensor([0.0])
    # Each epoch has a training and validation phase
    # for phase in ['train', 'val']:
    #     print("cur epoch ", epoch, "In phase: ", phase)
    #     if phase == 'train':
    model.train()
    print("for loop len: ", len(dataloader[phase]))
    i = 0
    tot_len = len(dataloader[phase])
    # Train the model on the dataset
    for images, targets in dataloader[phase]:
        i+=1
        print("iteration: ", i, "Percent Complete: ", i/tot_len*100)
        with torch.set_grad_enabled(phase == 'train'):
            if phase == 'train':
                output = model(images, targets)
                # print(output)
                class_loss = output["classification"]
                regression_loss = output["bbox_regression"]
                set_loss[0] += class_loss+regression_loss
                # optimize after completing each batch
                if i % batch_size == 0:
                    optimizer.zero_grad()
                    batch_loss = set_loss[0]
                    batch_loss.backward()
                    # Save model after batch for verbose logging
                    print("Saving model at iter: ", i, " of epoch: ", epoch, " with loss: ", set_loss.item())
                    torch.save(model.state_dict(), save_dir + "model_epoch_"+str(epoch)+"_iteration_"+str(i)+"_loss_"+str(set_loss[0].item())".zip")
                    optimizer.step()
                    scheduler.step(batch_loss)
                    set_loss = torch.Tensor([0.0])
    print('epoch loss: ', epoch_loss.item())
    running_loss.append(epoch_loss)
# Save the model after completing the training
torch.save(model.state_dict(), save_dir+"model_complete.zip")

{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python38664bit93e07db822b9475e8eac2c4d0dab897e",
   "display_name": "Python 3.8.6 64-bit",
   "language": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "source": [
    "import torch\n",
    "import torchvision as tv\n",
    "import pandas as pd\n",
    "import os\n",
    "import cv2\n",
    "import numpy as np\n",
    "from torchvision import models, io, transforms, utils\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as patches\n",
    "from torch.utils.data import Dataset, DataLoader\n",
    "from PIL import Image\n",
    "import time\n",
    "import copy\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "from torch.optim import lr_scheduler\n",
    "import torchvision\n",
    "from torchvision.models.detection.anchor_utils import AnchorGenerator\n",
    "from torchvision.ops.feature_pyramid_network import LastLevelP6P7\n",
    "from torchvision.models.detection.backbone_utils import resnet_fpn_backbone\n",
    "from os import listdir\n",
    "from os.path import isfile, join"
   ],
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "device = torch.device('cpu')"
   ]
  },
  {
   "source": [
    "class StanfordDroneDataset():\n",
    "    #* type = ['test', 'train', 'val']\n",
    "    def __init__(self, transform, type, device):\n",
    "        #* Import CSV\n",
    "        self.device = device\n",
    "        # import labels\n",
    "        label_file = '/path_to_labels/labels.csv'\n",
    "        labels_df = pd.read_csv(label_file, header = None, names = ['category', 'label'])\n",
    "        labels_df = labels_df.set_index('category')\n",
    "        # Save transformation\n",
    "        self.transform = transform\n",
    "\n",
    "        # Load file paths\n",
    "        mypath = r'/path_to_personal_image_files/'\n",
    "        onlyfiles = [join(mypath, f) for f in listdir(mypath) if isfile(join(mypath, f))]\n",
    "        self.paths = onlyfiles\n",
    "       \n",
    "    def __getitem__(self, idx):\n",
    "\n",
    "        img_path = self.paths[idx]\n",
    "        img_pil = Image.open(img_path).convert(\"RGB\")\n",
    "        img = self.transform(img_pil).to(self.device)\n",
    "        \n",
    "        return img\n",
    "    \n",
    "    def __len__(self):\n",
    "        return len(self.paths)"
   ],
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": []
  },
  {
   "source": [
    "# Model Setup"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model= models.detection.retinanet_resnet50_fpn(num_classes=7, pretrained=False, pretrained_backbone=True)\n",
    "#! Generate smaller anchors -- copied directly from model setup\n",
    "anchor_sizes = tuple((x, int(x * 2 ** (1.0 / 3)), int(x * 2 ** (2.0 / 3))) for x in [16, 32, 64, 128, 256])\n",
    "aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)\n",
    "anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)\n",
    "#! Update the anchor generator inside the model\n",
    "model.anchor_generator = anchor_generator\n",
    "model.load_state_dict(torch.load('path_to_final_model'))\n",
    "model.to(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "normalize = transforms.Compose([\n",
    "                                transforms.ToTensor(), \n",
    "                                ])\n",
    "phases = ['personal']\n",
    "#* Import Datasets\n",
    "dataset = {x: StanfordDroneDataset(normalize, x, device) for x in phases}\n",
    "dataloader = {x: torch.utils.data.DataLoader(dataset[x], batch_size=1, shuffle = True) for x in phases}\n",
    "#* Get dataset sizes\n",
    "dataset_sizes = {x: len(dataset[x]) for x in phases}\n",
    "print('done loading data')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    " phase = 'personal'\n",
    " model.eval()\n",
    " for image in dataloader[phase]:\n",
    "        output = model(image)\n",
    "        fig, ax = plt.subplots(1)\n",
    "\n",
    "        ax.imshow(transforms.ToPILImage()(image[0]), interpolation=\"bicubic\")\n",
    "        output = output[0]\n",
    "        # Perform NMS on the bounding boxes to remove duplicates. \n",
    "        idxs = torchvision.ops.nms(output['boxes'], output['scores'], 0.01)\n",
    "        \n",
    "        boxes = output['boxes'][idxs]\n",
    "        labels = output['labels'][idxs]\n",
    "        scores = output['scores'][idxs]\n",
    "\n",
    "        mean_score = torch.mean(scores)\n",
    "        std_score = torch.std(scores)\n",
    "\n",
    "        for i in range(scores.shape[0]):\n",
    "            if labels[i] in [4,5]:\n",
    "                box = output['boxes'][i]\n",
    "                box = box.detach().numpy()\n",
    "                rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth = 2, edgecolor = 'r', fill = False)\n",
    "                ax.add_patch(rect)\n",
    "                ax.text(box[0], box[1], str(scores[i].detach().numpy()))\n",
    "\n",
    "        plt.show()"
   ]
  },
  {
   "source": [
    "GSD = (sensor_width \\* altitude \\* 100) / (focal_length \\* image_width)    \n",
    "focal_length = 2.5mm    \n",
    "fov = 84    \n",
    "pixel = 1.4 x 1.4 m   \n",
    "sensor_width = 0.003629m   \n",
    "image_width = 2582   \n",
    "GSD = 0.003629*15/2.5e-3/2582 = 0.008433m/pixel\n"
   ],
   "cell_type": "markdown",
   "metadata": {}
  }
 ]
}

Credits

LUcfarmer6

1 project • 0 followers

Comments

Awards

Bonus Prizes

NXP HoverGames Challenge 2: Help Drones, Help Others During Pandemics

Aerial Social Distancing Monitoring with Drones

Things used in this project

Hardware components

Software apps and online services

Story

Background:

Project Summary:

Step One: Assemble Drone and Setup NavQ

Step Two: Download PyTorch and Stanford Drone Dataset

Step Three: Train RetinaNet on SDD

Step Four: Gather Validation Data

Step Five: Live Feed Detection

Acknowledgements:

Code

Video Recording

Train Retina Net

Validation Jupyter Notebook

Credits

LUcfarmer6

Comments

Awards

Embed the widget on your own site

Aerial Social Distancing Monitoring with Drones

Aerial Social Distancing Monitoring with Drones

Things used in this project

Hardware components

Software apps and online services

Story

Background:

Project Summary:

Step One: Assemble Drone and Setup NavQ

Step Two: Download PyTorch and Stanford Drone Dataset

Step Three: Train RetinaNet on SDD

Step Four: Gather Validation Data

Step Five: Live Feed Detection

Acknowledgements:

Code

Video Recording

Train Retina Net

Validation Jupyter Notebook

Credits

LUcfarmer6

Comments

Awards

Related channels and tags