This project demonstrates how, by creating a training set of only 1, 500 images from scratch, carefully selecting a search region and applying Transfer Learning technique, one can build and deploy on Edge AI device - Google AIY Vision kit – the model which reliably recognizes simple hand gestures. A fairly accurate model with a latency of 1-2 seconds runs on the Google Vision box and does not require any access to Internet or cloud. It can be used to control your mobile robot, replace your TV remote control, or for many other applications. The described approach of carefully selecting the search region, collecting a relatively small number of customized training images and re-training open-sourced deep learning models to create a model for a specific task (e.g. the model which controls access to the facilities by recognizing faces of the company's employees) can be applied to create numerous and diverse applications.
- Buy Google AIY Vision kit and assemble it following these instructions
- Power the assembled Google AIY Vision kit
- Start Dev terminal
- Stop and disable joy_detector_demo application which is set to start automatically after the booting
sudo systemctl stop joy_detection_demo.service sudo systemctl disable joy_detection_demo.service
- Update OS
sudo apt-get update sudo apt-get upgrade
- Clone the GitHub repository with hand gesture classifier and navigate to the project folder
cd src/examples/vision git clone https://github.com/dvillevald/hand_gesture_classifier.git cd hand_gesture_classifier chmod +x hand_gesture_classifier.py
- Start hand gesture classifier
./hand_gesture_classifier.py \ --model_path ~/AIY-projects-python/src/examples/vision/hand_gesture_classifier/hand_gesture_classifier.binaryproto \ --label_path ~/AIY-projects-python/src/examples/vision/hand_gesture_classifier/hand_gesture_labels.txt \ --input_height 160 \ --input_width 160 \ --input_layer input \ --output_layer final_result
Important: It seems that on some Google AIY Vision kits the logic of GPIO pins is inverse - pin.off() changes pin status to HIGH and pin.on() - to LOW. If you observe that your hand command classifier works but shows incorrect commands (e.g. displays right instead of left) then add the following line to the command above:
Launching hand command classifier:
LED on the Kit Top is GREEN
Once you start the application, it launches face/joy detector pre-installed on the Google AIY Vision kit which tries to detect the human face and determine the size and location of the bounding box around it. During this step the LED on the top of the Google Vision box is GREEN.
Face detector determines the area where hand commands will be displayed
Once the face is reliably detected on several dozens of frames, application uses the size and the location of the face bounding box to determine the size and location of the the chest area (called hand box hereinafter) where the hand gestures are expected to be displayed:
There are several advantages of this approach:
- The search space is greatly reduced to only include the chest area which significantly improves the latency of the detector.
- Displaying hand gestures in the chest area improves the quality of the hand detector as a user has a high degree of control of the image background (one's t-shirt) and because the number and diversity of possible backgrounds is greatly reduced (to the number of t-shorts and sweaters in user's wardrobe.) Because of that one does not need a large data set to build a model which makes a fairly accurate predictions so it takes less time to collect the training data and to train your model.
If face detection takes longer than 10-15 seconds:
- It is possible that the face detector cannot detect your face if, for example, your wear a particular glasses. Before starting this application, make sure Google's face/joy detector can detect your face (e.g. reacts to your smile.)
- Make sure you stand still during this step – to better estimate the parameters of the face bounding box the face detector compares face box parameters on several frames and will only proceed to the next step when they are stable.
- Move further away from the camera – it is possible that you are too close and while your face is detected, your chest area does not fit into the image. The optimal distance from the Google Vision kit is about 10 feet (3 meters.)
Once you face is reliably detected the LED on the top of Google Vision box turns PURPLE, face detection stops and the hand gesture recognizer is loaded ready to be activated.
LED on the Kit Top is PURPLE
To make sure the application does not react to the noise, the hand command recognizer should be activated. To activate hand gesture recognizer, display one of these two hand commands in your chest area for 5-7 seconds.
Once LED of the top of Google Vision box turns RED, hand gesture recognizer is activated and is ready to accept your hand signals. If hand gesture recognizer fails to be activated after 30 seconds of waiting, the application goes to face detection mode again (Step 1 above.)
LED on the Kit Top is RED
Once hand recognizer is activated (LED is RED), you can start displaying your hand commands to control your external devices. It takes 1-2 seconds to detect a hand signal, so in order to avoid sending the wrong command, try to move your hands fast from one signal to another or, better yet, put your hands down in between the signals so no hand signal is detected.
The following hand gestures are available:
No hand command:
Once your hand command is detected, it will be printed in terminal.
Also, it will change the state of GPIO pins on the back of Google AIY Vision kit which you can use to control your external devices. The following table shows the mapping of your hand commands above to the states of GPIO pins A, B and C of the Google AIY Vision kit.
Mapping Hand Commands to Google AIY Vision Kit GPIO pins:
To close your hand command session, make sure you deactivate the recognizer so it would stop sending commands to your devices. To deactivate the device, display one of the following 2 hand gestures for 5-7 seconds:
Once deactivated, the LED on the top of Google AIY Vision kit will turn off and the application will go into the face detection mode (Step 1 above.)
You can terminate the application and safely shut down your Google AIY Vision kit at any time by pushing the button on the top of the kit.
You can build a simple device which, once connected to Google AIY Vision kit via GPIO pins, will display your hand commands. I built this display box
From the following components:
- 5 x resistors 330 Ohm
- 5 x red LEDs
- 2 x PCB Boards
- Power supply for Arduino
- Cardboard box (from belt or shoes)
Arduino UNO is 5V device. However, GPIO pins of Google AIY Vision kit output 3.3V-based signals. Because of that you would need 3.3V-to-5V Logic Level Converter to step up signals from Raspberry Pi of Google AIY Vision kit.
Assembling Display Box
- Assemble the Display Box following images and schematic diagram above
- Connect Display Box to the computer via USB cord and load sketch /arduino/hand_gesture_display_box.ino to Arduino UNO of Display Box
- Power Display Box with Arduino power supply
- Connect Display Box with GPIO pins of Google AIY Vision kit following this schematic:
Important: Make sure that the red wire (+5V) is connected to the rightmost GPIO pin and the black wire (GND) - to the leftmost GPIO pin of Google AIY Vision kit:
Connecting 6-pin Jumperwire from Display Box to Google AIY Vision Kit:
Once your Display Box is powered and connected to Google AIY Vision kit's GPIO pins, you can start hand gesture classifier following the steps described above.
Part II (Optional) Create your training dataset with Google AIY vision, train your model and deploy on Google AIY Vision kit.
The second part of this tutorial, Transfer Learning model on Google AIY Vision kit, explains how
- Collect a training dataset of 1,500 hand command images with Google AIY Vision kit
- Use this dataset and Transfer Learning to build your own Hand Command Classifier by retraining the last layer of MobileNet model and deploy your Classifier on Google AIY Vision kit