The code of this project is given in the code files section

Published June 8, 2020 © GPL3+

Media Control Using Hand Gestures

A system which operates some commands when a specific hand gesture is being detected.

IntermediateFull instructions provided5 hours650

Things used in this project

Hardware components

Raspberry Pi 3 Model B

LED (generic)

Buzzer

Webcam, Logitech® HD Pro

Breadboard (generic)

Jumper wires (generic)

Software apps and online services

Raspberry Pi Raspbian

Story

The basic idea and the reason to work on this project is quite simple. The project uses the concepts of computer vision, real time image detection and Convolutional Neural Networks. As our everyday tasks are being more automated and the use of man work has been decreased due the enhanced capabilities and the era of Artificial Intelligence has made the machines much more advanced than before. As, Computer Vision is currently the hot topics in the AI industry, from Self Driving Cars to the Amazon Smart Store, all of them are working on the fundamental concept of computer vision. Therefore I created a project which uses computer vision to detect the hand gesture and execute a command associated with that gesture, the LED will blink up and the buzzer will turn on.

I started with the data collection process and to minimize my data cleaning process, I made my own data by using my webcam to capture the images of different gestures of same size, so that the process of resizing them is not required, and they were stored in the form of grayscale images.

We then need to install some libraries like NumPy, OpenCV and Tensorflow to make our training model, capturing the real-time data from the user and testing our model. To train my model, I made 3 ConvNet layers and added MaxPooling to maximize the feature extraction from images. The model was then compiled and saved. The saved model was then transferred to raspberry pi as the processing power of RaspberryPi is quite slow.

Execute the script of train.py to train the data and save the model to your computer.T hen execute the Prediction.py script. Rememberto change the path of the video to your computer. The scripts will pop up three windows:

Media Player, where video will be running.
Your webcam window, where you can see your image and the place where to put your hand in.
A ROI window, which tells you the lighting condition of your background. Always ensure that this windows is totally black until you place your hand in it.

The code of this project is given in the code files section.

NOTE: This project is a part of assignment submitted to Deakin University, School of IT, Unit SIT210 - Embedded Systems Development

Code

import vlc
import cv2
from tensorflow import keras
import tensorflow as tf
from keras.models import load_model
import operator
import numpy
import time
import RPi.GPIO as GPIO

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(11,GPIO.OUT)
GPIO.setup(13, GPIO.OUT)
from time import sleep


Instance = vlc.Instance()
player = Instance.media_player_new()
Media = Instance.media_new("C:/Users/MSIBT/Downloads/Video/Facebook Coding Interview Question and Answer #1- All Subsets of a Set.mp4")
player.set_media(Media)
player.play()

classifier = tf.keras.models.load_model('C:/Program Files/VideoLAN/VLC/model.h5')
cap = cv2.VideoCapture(0)
# Category dictionary
class_labels = ["NONE","ONE",'TWO','THREE','FOUR','FIVE']
while True:
    ret, frame = cap.read()
        # Simulating mirror image
    frame = cv2.flip(frame, 1)

        #getting roi of the hand part
    x1 = 400
    y1 = 50
    x2 = 600
    y2 = 300
    cv2.rectangle(frame , (x1-2,y1-2),(x2+2,y2+2),(0,255,0),2)

    # extracting the roi and converting it to gray
    roi = frame[y1:y2 , x1:x2]
    roi = cv2.resize(roi,(150,150))
    roi = cv2.cvtColor(roi,cv2.COLOR_BGR2GRAY)

    # applying a threshold to the region of interest
    ret,test_image = cv2.threshold(roi,127,255,cv2.THRESH_BINARY_INV)
    cv2.imshow('test',test_image)
    result = classifier.predict(test_image.reshape(1, 150, 150, 1))
    prediction = {'ONE': result[0][0], 
                  'TWO': result[0][1], 
                  'THREE': result[0][2],
                  'FOUR': result[0][3],
                  'FIVE': result[0][4]}
        # Sorting based on top prediction
    prediction = sorted(prediction.items(), key=operator.itemgetter(1), reverse=True)
    
        # Displaying the predictions
    cv2.putText(frame, prediction[0][0], (100, 450), cv2.FONT_HERSHEY_PLAIN, 4, (0,0,255), 4)
    print(prediction[0][0], )
    if prediction[0][0] == "THREE":
        player.play()
        GPIO.output(11,True)
        GPIO.output(13,False)
    elif prediction[0][0] == "FOUR":
        player.stop()
        GPIO.output(11,False)
        GPIO.output(13,True)
        
    cv2.imshow("Frame", frame)
    
    interrupt = cv2.waitKey(1)
    if interrupt & 0xFF == 27: # esc key
        break
cap.release()
cv2.destroyAllWindows()

import numpy as np
import tensorflow as tf
import cv2
from tensorflow import keras
import os
import operator

'''Data Collection Process'''
def collect_data():
    # creating a directory for the captured images
    if not os.path.exists("Images"):
        os.makedirs("Images/train")
        os.makedirs("Images/test")
        os.makedirs("Images/train/None")
        os.makedirs("Images/train/1")
        os.makedirs("Images/train/2")
        os.makedirs("Images/train/3")
        os.makedirs("Images/train/4")
        os.makedirs("Images/train/5")
        os.makedirs("Images/test/None")
        os.makedirs("Images/test/1")
        os.makedirs("Images/test/2")
        os.makedirs("Images/test/3")
        os.makedirs("Images/test/4")
        os.makedirs("Images/test/5")

    mode = 'Train'
    directory = "Images/Images/"+mode+"/"

    cap = cv2.VideoCapture(0) #capturing the video
    while (True):
        ret,frame = cap.read()
        frame = cv2.flip(frame,1)

        # making dict ti get number of images of each categories
        count = {"None" : len(os.listdir(directory+"None")),
            "One" : len(os.listdir(directory+"/1")),
          "Two" : len(os.listdir(directory+"/2")),
         "Three" : len(os.listdir(directory+"/3")),
         "Four" : len(os.listdir(directory+"/4")),
         "Five" : len(os.listdir(directory+"/5"))
        }

        height = str(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) # 480
        width = str(cap.get(cv2.CAP_PROP_FRAME_WIDTH))  # 640

        # writing the counts of specified no of trained images on video
        if ret == True:
            cv2.putText(frame,mode,(10,40),cv2.FONT_HERSHEY_COMPLEX,1,(0,0,255),1,cv2.LINE_AA)
            cv2.putText(frame,"One: "+str(count['One']),(10,100),cv2.FONT_HERSHEY_COMPLEX,1,(255,0,0),1,cv2.LINE_AA)
            cv2.putText(frame,"Two:"+str(count['Two']),(10,140),cv2.FONT_HERSHEY_COMPLEX,1,(255,0,0),1,cv2.LINE_AA)
            cv2.putText(frame,"Three:"+str(count['Three']),(10,180),cv2.FONT_HERSHEY_COMPLEX,1,(255,0,0),1,cv2.LINE_AA)
            cv2.putText(frame,"Four:"+str(count['Four']),(10,220),cv2.FONT_HERSHEY_COMPLEX,1,(255,0,0),1,cv2.LINE_AA)
            cv2.putText(frame,"Five:"+str(count['Five']),(10,260),cv2.FONT_HERSHEY_COMPLEX,1,(255,0,0),1,cv2.LINE_AA)
            
            #creating a roi
            x1 = 400
            y1 = 50
            x2 = 600
            y2 = 300
            cv2.rectangle(frame , (x1-2,y1-2),(x2+2,y2+2),(0,255,0),2)

            # extracting the roi and converting it to gray
            roi = frame[y1:y2 , x1:x2]
            roi = cv2.resize(roi,(150,150))
            roi = cv2.cvtColor(roi,cv2.COLOR_BGR2GRAY)
            # applying a threshold to the region of interest
            ret,roi = cv2.threshold(roi,127,255,cv2.THRESH_BINARY_INV)

            #performing image processing dilation , filtering and smoothing
            kernal = np.ones((2,2),np.uint8)
            roi = cv2.dilate(roi,kernel=kernal ,iterations=1)
            roi = cv2.erode(roi ,kernel=kernal ,iterations=1)
            #roi = cv2.bilateralFilter(roi,9,75,75)
            roi = cv2.medianBlur(roi,5)


            cv2.imshow('ROI',roi)
            cv2.imshow("frame",frame)

            # commands dependent on keys pressed
            k = cv2.waitKey(1)
            if k == 27:
                print("Escape closing camera")
                break
            elif k == ord('0'):
                cv2.imwrite(directory+"None/"+str(count['None'])+".png",roi)
                print("Picture labelled None saved to train!")
            elif k == ord('1'):
                cv2.imwrite(directory+"1/"+str(count['One'])+".png",roi)
                print("Picture labelled 1 saved to train!")
            elif k == ord('2'):
                cv2.imwrite(directory+"2/"+str(count['Two'])+".png",roi)
                print("Picture labelled 2 saved to train!")
            elif k == ord('3'):
                cv2.imwrite(directory+"3/"+str(count['Three'])+".png",roi)
                print("Picture labelled 3 saved to train!")
            elif k == ord('4'): 
                cv2.imwrite(directory+"4/"+str(count['Four'])+".png",roi)
                print("Picture labelled 4 saved to train!")
            elif k == ord('5'): 
                cv2.imwrite(directory+"5/"+str(count['Five'])+".png",roi)
                print("Picture labelled 5 saved to train!")
        else:
            break
    cap.release()
    cv2.destroyAllWindows()

def train_data():
    # creating the CNN model
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32,(3,3),activation='relu',input_shape=(150,150,1)),
        tf.keras.layers.MaxPool2D(2,2),
        tf.keras.layers.Conv2D(32,(3,3),activation='relu'),
        tf.keras.layers.MaxPool2D(2,2),
        tf.keras.layers.Conv2D(32,(3,3),activation='relu'),
        tf.keras.layers.MaxPool2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512,activation='relu'),
        tf.keras.layers.Dense(6,activation='softmax')
    ])
    
    model.summary()
    
    # compiling the model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    # preparing the data and training the model
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    train_datagen = ImageDataGenerator(rescale=1./255)
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    # stream train images from train directory
    training_set = train_datagen.flow_from_directory(
    'Images/Images/train',
    target_size=(150,150),
    batch_size=5,
    color_mode = 'grayscale',
    class_mode = 'categorical')
    
    # stream test images from test directory
    test_set = test_datagen.flow_from_directory(
    'Images/Images/test',
    target_size=(150,150),
    batch_size=5,
    color_mode = 'grayscale',
    class_mode = 'categorical')
    
    # fitting the training data to the model
    model.fit_generator(
    training_set,
    steps_per_epoch = 40,
    epochs=5,
    validation_data=test_set,
    #validation_steps=40,
    verbose =2
    )
    
    #saving the model
    model.save('model.h5')

Credits

Sibtain Reza

1 project • 0 followers

Media Control Using Hand Gestures

Things used in this project

Hardware components

Software apps and online services

Story

The code of this project is given in the code files section.

Schematics

Complete Circuit

BreadBoard

Circuit Board

Code

Prediction.py

collect_data.py

train.py

Credits

Sibtain Reza

Comments

Embed the widget on your own site

Media Control Using Hand Gestures

Media Control Using Hand Gestures

Things used in this project

Hardware components

Software apps and online services

Story

The code of this project is given in the code files section.

Schematics

Complete Circuit

BreadBoard

Circuit Board

Code

Prediction.py

collect_data.py

train.py

Credits

Sibtain Reza

Comments

Related channels and tags