Many developers are curious about using machine learning techniques in their apps. Digging into the processes involved, it becomes clear that you need to first define the problem that you're trying to solve and ensure that you need a machine learning model in the first place. For many visual identification tasks, it's obvious that machine learning ("ML") is very helpful, as it allows a computer to "learn without being explicitly programmed".
Rather than requiring a program to identify every bird by feeding an image of every existing creature into a database, we can use a technique of building a ML model which is an algorithm trained to recognize patterns and make 'educated guesses'. Fed plenty of good quality images of cardinals, for example, a ML model would presume that the red bird at the feeder would most likely be a cardinal, and not a blue jay, based on patterns it has deduced in the images it has 'seen'.
For some projects, you might use a pretrained ML model. There are excellent cognitive services that provide APIs you can query to determine objects, but these models are often not very specific. For example the closest this type of service can get to this majestic cardinal is "bird" - although it guesses that it might be either a northern cardinal or a hummingbird with some certainty.
For a very specific test of a dataset, where you need to determine male, female, juvenile, breeding and nonbreeding birds, it's likely you will need to build your own custom machine learning model. So let's do that.
In this project, we're going to use a ML model that we train on images gathered from the Cornell Ornithology Lab and running on a Coral Board, Google's new Edge TPU device, fitted with a camera, to observe backyard birds.
There's nothing obliging you to train models exclusively to observe different songbirds in your backyard. In fact, I've used the techniques below to train quantization-aware custom TensorFlow lite models on various types of cheese and on small cocktail bottles in minibars (in Boston, we call them 'nips', much to the amusement of my British friends). As long as you can gather a good set of images, you can build a good model with these techniques. I was inspired to observe birds based on a depressing article in the New York Times about a great die-off of songbirds happening now...think Rachel Carson's Silent Spring. There are many citizen science projects such as the Great Backyard Bird Count that encourage everyone to count the birds at their bird feeders and report the data back. What if we could streamline the process a little by training a model that would identify the birds visiting our bird feeder as they are picked up by a camera?
The Coral Board has their own birdfeeder project, which integrates the board within the bird feeder itself; it's a much more hardware-heavy project than this one, but definitely [worth checking out](https://coral.withgoogle.com/projects/bird-feeder/). They use a pretrained model to identify birds, but note that you are probably going to have to train something custom. Keep reading if you want to do just that!
There are several datasets of birds, but the best and most comprehensive one I've found for North America, where I live, is Cornell's. It features 400 species of birds and comes with 48, 000 images. It's important to realize that the same species of birds come in different colors, sizes and shapes, so many species are listed as 'male', 'female', and 'juvenile' since they often look so different. Thus, there are 550 categories of 400 species in this dataset. It's really great to have data at this level of granularity (thanks, Cornell!) in beautiful color images.
When you download the NABirds dataset, you'll notice it comes bundled with several Python scripts that are worth exploring. We're going to concern ourselves with the folder labelled 'images'. In it are numbered folders filled with images of birds. Digging in, you find that #1001 is an American Goldfinch (Female/Nonbreeding Male) - and indeed, here she is, isn't she lovely:
For my purposes, knowing in general the types of birds that I have seen flying around my West-of-Boston suburban enclave, I picked through the dataset and chose the most common birds, since training on large datasets will tax my MacBook Pro considerably. My final set included 1854 images in total:
Before we can train a model to analyze your data, you need to get the data in shape. Zip all the named folders on which you want to train your model, naming the file 'custom_photos.zip', and upload it to Dropbox. The scripts we're going to use presume that you will download a zipped set of folders which it then processes. Take note of the path provided by Dropbox.
Tip 💡When you generate a link on Dropbox, it adds some unnecessary paths like this: https://www.dropbox.com/s/9kcx3ukzk5sehym/custom_photos.zip?dl=0. You need to edit the path to look like this: https://dl.dropbox.com/s/9kcx3ukzk5sehym/custom_photos.zip to allow the zip to be downloadable.
Once you have your images organized into folders and uploaded to Dropbox, you are ready to start your training. Before you can, though, you're going to have to have TensorFlow installed locally on your computer. To do that, follow the instructions here.
Why TensorFlow?TensorFlow is a platform built by Google that provides several APIs and many tools to help data scientists, machine learning professionals, and folks leveraging deep learning to create models to analyze data.
# Requires the latest pip
pip install --upgrade pip
# Current stable release for CPU-only
pip install tensorflow
Then, take a look at the repo that I have prepared by forking TensorFlow's directory of models and training techniques using TensorFlow's TF-Slim high-level API. If you navigate into 'research/slim/scripts' you will find a series of named scripts referring to custom training. These are to differentiate from the training scripts provided by TensorFlow researchers who show how to train on the standard Flowers model typical of ML demos.
Note, you need to have Python 3 installed on your local computer to continue. The scripts also assume that your computer is able to run Bash scripts. Open a terminal window on your computer and type 'python' to see which version you have, or whether you need to install a new version.
Before you start training, take a look around this repo. If you look at 'scripts/constants.sh', you can see that several directories are listed where the files that the scripts will create will be stored, namely 'slim/custom'. If there's anything in that folder, you should delete it, and remember to delete any files there before retraining.
Next, go to 'slim/datasets/custom.py' and edit lines 34 and 36, making sure you divide your dataset into a logical amount for training and validation. For a dataset of 3000 images, for example, you can say that 2500 images will be used for training and 500 for validation.
Then edit the _NUM_CLASSES variable to reflect the number of classes in your set (e.g. the types of birds as reflected by your folders).
Take a look at a Python file in 'slim/datasets' called 'download_and_convert_custom.py'. In this file, you should set your _DATA_URL to your dataset's downloadable zip. Also in this file, edit _NUM_VALIDATION to reflect your validation number that you set above.
This is the first script to run when you're ready: type 'bash step1_prepare.sh' in the 'scripts' folder and you're off to the races! This script downloads your dataset and converts it to TFRecord format for training purposes.
What's a TFRecord? According to TensorFlow docs, "the TFRecord format is a simple format for storing a sequence of binary records." It's a handy formatting tool to get your images into a clean, standard structure.
This script takes the longest because TensorFlow does its real work here and starts training a model. In this script, you can change the 'network_type' variable if you want to use a different network (I usually use mobilenet_v1). You can choose to train an entire model or just the last few layers. You can also tweak the variables such as 'weight_decay' to get better results.
Once Script 2 has completed, run script 3 to check the accuracy of your model. If it's not good enough, tweak parameters in script 2 and retrain. My current training for the birds model was 89% accurate - not bad!
Try for an accuracy above 80%
When your model is accurate enough, run this script to convert your model to a quantized TensorFlow Lite model, complete with a labels.txt file that lists the classes. Now you're ready to use this model in your Coral Board!
Now you have a model, comprising of two files in 'slim/custom/models': 'labels.txt' and 'graph.tflite' or some similarly-named file. This is a nicely quantized, reasonably accurate model that will allow your device to make educated guesses on what it 'sees' in nature. Now we need to load it onto a Coral Board that's configured with a camera.
From this point, I'm assuming that you have a Coral Board with a camera set up and running. Instructions on how to do so are here. You need to have the 'mdt' server toolkit installed and functioning, and the board powered up and connected to your computer with a data cable. Read through the Coral Board documentation on setting up the board if you get stuck.
We're going to use the system bundled with the Coral board which allows you to conduct inference (analyzing images in real-time) using custom-trained machine learning models. Right now, you can test the system by connecting to the board:
A demo will open if you navigate to http://192.168.100.2:4664/ on your computer. It shows inference occurring with a video of cars and trucks; you can see how fast inference can happen as the model identifies cars and trucks passing by:
Similarly, you can test the camera's inference on a pretrained object model or on a pretrained face detection model, which sometimes has fun results:
Once your camera is set up and performing inference, all you have to do is swap out the.tflite file and labels file for it to start 'watching' for songbirds. To do that, you can upload your.tflite and labels.txt files to Dropbox, as we did before for the raw images. Then, you can download those to your Coral board:
wget -P models/ https://dl.dropbox.com/s/6xshh9eblfx05vy/graph.tflite
wget -P models/ https://dl.dropbox.com/s/y7x9rjok7scef2n/labels.txt
Then, restart the edge TPU server to use those files for inference via the camera on a streaming local site:
--model models/graph.tflite \
Test your model by pointing the camera at various images of birds. As you can see, the accuracy is pretty good! It's definitely determining that there are titmice and cardinals of potentially varying types in the image.
The real challenge comes when trying to get the camera to determine birds in low light. While titmouse birds come most often to my feeder, the model has trouble 'seeing' in poor light through a window as the installation is very simple - the camera is inside the window so the installation can stay dry and protected.
This titmouse loves sunflower seeds! But sadly, the light isn't good enough for good inference to happen in my current setup and the camera is just unsure of what it's 'seeing':
No wonder the model can't determine the bird if the human eye has difficulty as well (spoiler alert: it's a titmouse). But if you can setup your camera in a well-lit spot with great light and a good variety of birdseed, you will have better luck with this model! I've uploaded it to try so that you don't have to train something new, if you don't want to.
A next step for this project would be to enter the data into a database and do a true count of the birds, although it's quite difficult to differentiate individuals (all the titmice look the same to me!). What will you build? Let me know in the comments below.
Worst-case scenario, your cat will be amused: