Published September 1, 2018 © GPL3+

AI Digit Recognition with PiCamera

Recognize digits with Raspberry Pi, Pi Camera, OpenCV, and TensorFlow.

BeginnerFull instructions provided20 hours24,714

Things used in this project

Hardware components

Raspberry Pi 3 Model B

Raspberry Pi 2 Model B

Raspberry Pi Zero Wireless

Raspberry Pi Camera Module

Software apps and online services

TensorFlow

OpenCV

Story

In this project, we are going to train a deep convolutional neural network to transcribe digits. Then we are going to use the data from the learning stage to allow the Pi Camera to read and recognize digits. The AI pipeline will be implemented using Scikit and OpenCV 3.3 for image manipulation and Keras which uses Tensorflow as a back-end for the deep learning part.

1 / 2

To keep this easy no feature localization stage is done. You'll have to shove the image in front of the camera lens so that it's the only feature that it sees.

The MNIST dataset will be used. It is comprised of 60,000 training examples and 10,000 test examples of the handwritten digits 0–9 formatted as 28x28-pixel monochrome images. Basically we are transforming all acquired images from the camera in images that looks like this:

The main network topology can be described by this image below:

The last layer is a fully connected layer which maps to 10 categories representing the 10 digits.

We are going to do two things. First we train a network for recognizing digits. Then we used the weights of the network we trained for recognizing live camera feed digits taken from the Raspberry Pi camera.

I used a third hand to hold the Raspberry Pi Camera since that was all I had. The mechanical setup can be described by this picture below:

Before we start all of this however let's install everything we need first. I used Python virtual environments to setup the program. So assuming you have all the programs listed below you can issue:

source ~/.profile 
workon cv
python PiCameraApp.py --picamera 1

So lets get to the details. First let's install a bunch of programs.

Install Tensorflow

pip install tensorflow

Install Keras

pip install keras

Install Open-CV 3.3

Installation of OpenCV is a bit involved if you need all the optimizations. This means we have to compile it from scratch since the one from pip package manager does not have all the optimizations.

The best tutorial I found is from this link:

https://www.pyimagesearch.com/2017/09/04/raspbian-stretch-install-opencv-3-python-on-your-raspberry-pi/

Finally install the picamera with Numpy optimizations.

pip install "picamera[array]"

Now after we have all the software stack installed on the RPI we have to do some training. The network should be trained on a laptop preferable with a GPU, unless you are a hero who's comfortable with a glacier slow performance and you decide to do that on a RPI.

Training the Network

To train the network run the python file on a laptop by issuing :

python Train_MNIST.py

This assumes that you have Cuda (if using the gpu version) , Tensorflow, Keras and matplotlib installed on your laptop.

The program on this file uses Keras to defines a deep neural network model, compile it and after training and validation phases are done it saves the weights of the network.

At the end the program saves the weights of the network as a.h5 file. This is the file with the network weights that we are going to load on the recognition script running on the RPI to recognize live digit images.

Copy the weight file over to your RPI using either scp or WinSCP.

If you have an NVIDIA GPU, training will take a couple of minutes depending on the compute capability of your card. To leverage the GPU however you'll have to install the GPU version of Tensorflow as well as the CUDA executable from NVIDIA website.Otherwise it may take a bit longer if you are only using the CPU.

Recognizing Live Images of Digits

I ended up testing both handwritten digits and printed digits. Accuracy of prediction depends mostly on lighting and image angle and how ambiguous (read crappy) your writing really is. After you start the app press t to read the digits and q to quit.

Recognizing the digit 4. I had to use a lot of ink to draw that 4.

Sometimes the network prints infinitesimally low probabilities for the other numbers. So there is a 0.0001 % chance that it may be a seven.

Tools of the trade.

Program Explanation

The program takes a snapshot from the camera upon the press of the 't' key and applies a number of transformation steps to the image before forwarding it to the DNN.

The first thing one needs to keep in mind is that the color images are acquired as a big array of floating point numbers First the image is converted from an RGB format to a gray scale image so we are effectively throwing out two channels.

The next step is to convert the floating point format of the image to an 8 bit number with a range of 0-255.

Next we use OpenCV, to do the thresholding. The Otsu method is used to automatically threshold the image so that the features of the number are evident. The next step is to resize the image to a format of 28x28 pixels. This is the same format accepted by the MNIST DNN.

One can use either scikit image, open-cv or Keras to do the re-scaling.

After the image is re-scaled the next step is to invert the colors since the MNIST expects that numbers will be in a black background as opposed to black lines on a white background.

After post-processing the image is sent to the DNN which makes a prediction of the observed digit.

The output array represents the probabilities that the observed image is that number. So a 1 in position 2 shows 100% certainty since it's a 1. Keep in mind that position 1 is reserved for 0.

Algorithm Steps

1. Read the image

First step is to obviously put an image before the camera. This will be scaled later since the CNN (convolutional neural network) expect images of a certain size.

2.Convert to gray scale

The acquired image is then converted to gray-scale by using the scipy function call. Coincidentally you can only use opencv for the image manipulations but you have to remember all the function names. Also another point , there are some very subtle differences between scipy and open-cv when it comes to certain functions.

3. Scale image range

Here the image is converted from a floating point format to a uint8 range [0, 255]

4. Thresholding

To obtain a nice black and white image, thresholding is done via the Otsu method. This is the magic sauce step since doing thresholding manually will have one enter the values one by one.

5. Resize image

The image is resized to a 28 by 28 pixel array. This is then flattened to a linear array of size (28x28)

6. Invert image

MNIST DNN accepts images as 28x28 pixels, drawn as white on black background. So we have to invert the image.

7. Feed into trained neural network

This is the last step. Here we are loading the deep neural network weights and feed the image to the network. It takes 2-3 seconds to come up with a prediction.

8. Print answer

Finally we end up with an output array with 10 classes showing all the digits from 0-9. The position of the array represent the probability of the inference being made by the network. Translating this into human speak means picking the position with the highest probability.

The main setup looks like a medical device.

Fin!

That's all.This showed how to implement a neural network that can recognize digits.

Code uploaded on GitHub as always.

Code

PiCameraDigit_AI_Recogizer.py

#!/usr/bin/env python

# Copyright dhq 2018 Aug 31
# Licensed under GPL V3

#Theory of operation

# 1. read image
# 2. convert to gray scale
# 3. convert to uint8 range
# 4. threshold via otsu method
# 5. resize image
# 6. invert image to balck background
# 7. Feed into trained neural network 
# 8. print answer

# from skimage.io import imread
#from skimage.transform import resize
import numpy as np
#from skimage import data, io
#from matplotlib import pyplot as plt
from skimage import img_as_ubyte		#convert float to uint8
from skimage.color import rgb2gray
import cv2
import datetime
import argparse
import imutils
import time
from time import sleep
from imutils.video import VideoStream
from keras.models import load_model

model=load_model('mnist_trained_model.h5')		#import CNN model weight

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--picamera", type=int, default=-1,
	help="whether or not the Raspberry Pi camera should be used")
args = vars(ap.parse_args())
 
# initialize the video stream and allow the cammera sensor to warmup
vs = VideoStream(usePiCamera=args["picamera"] > 0).start()
time.sleep(2.0)

def ImagePreProcess(im_orig):
	im_gray = rgb2gray(im_orig)				#convert original to gray image
	#io.imshow(im_gray)
	#plt.show()
	img_gray_u8 = img_as_ubyte(im_gray)		# convert grey image to uint8
	#cv2.imshow("Window", img_gray_u8)
	#io.imshow(img_gray_u8)
	#plt.show()
	#Convert grayscale image to binary
	(thresh, im_bw) = cv2.threshold(img_gray_u8, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
	#cv2.imshow("Window", im_bw)
	#resize using opencv
	img_resized = cv2.resize(im_bw,(28,28))
	#cv2.imshow("Window", img_resized)
	############################################################
	#resize using sciikit
	#im_resize = resize(im,(28,28), mode='constant')
	#io.imshow(im_resize) 
	#plt.show()
	#cv2.imshow("Window", im_resize)
	##########################################################
	#invert image
	im_gray_invert = 255 - img_resized
	#cv2.imshow("Window", im_gray_invert)
	####################################
	im_final = im_gray_invert.reshape(1,28,28,1)
	# the below output is a array of possibility of respective digit
	ans = model.predict(im_final)
	print(ans)
	# choose the digit with greatest possibility as predicted dight
	ans = ans[0].tolist().index(max(ans[0].tolist()))
	print('DNN predicted digit is: ',ans)



def main():
	# loop over the frames from the video stream
	while True:
		try:
			# grab the frame from the threaded video stream and resize it
			# to have a maximum width of 400 pixels
			frame = vs.read()
			frame = imutils.resize(frame, width=400)
		 
			# draw the timestamp on the frame
			timestamp = datetime.datetime.now()
			ts = timestamp.strftime("%A %d %B %Y %I:%M:%S%p")
			cv2.putText(frame, ts, (10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX,
				0.35, (0, 0, 255), 1)
		 
			# show the frame
			cv2.imshow("Frame", frame)
			key = cv2.waitKey(1) & 0xFF
		 
			# if the `q` key was pressed, break from the loop
			if key == ord("q"):
				break
				# do a bit of cleanup
				cv2.destroyAllWindows()
				vs.stop()
			elif key == ord("t"):
				cv2.imwrite("num.jpg", frame)  
				im_orig = cv2.imread("num.jpg")
				ImagePreProcess(im_orig)
			else:
				pass
				
		except KeyboardInterrupt:
			# do a bit of cleanup
			cv2.destroyAllWindows()
			vs.stop()
			
			

if __name__=="__main__":
	main()

Credits

Dimiter Kendri

23 projects • 164 followers

Robotics and AI

Thanks to Keras Team and Adrian Rosebrock.

AI Digit Recognition with PiCamera

Things used in this project

Hardware components

Software apps and online services

Story

Install Tensorflow

Install Keras

Install Open-CV 3.3

Training the Network

Recognizing Live Images of Digits

Program Explanation

Algorithm Steps

1. Read the image

2.Convert to gray scale

3. Scale image range

4. Thresholding

5. Resize image

6. Invert image

7. Feed into trained neural network

8. Print answer

Fin!

Schematics

PiCamera AI Digit Recogizer

Code

PiCameraDigit_AI_Recogizer.py

Credits

Dimiter Kendri

Comments

Embed the widget on your own site

AI Digit Recognition with PiCamera

AI Digit Recognition with PiCamera

Things used in this project

Hardware components

Software apps and online services

Story

Install Tensorflow

Install Keras

Install Open-CV 3.3

Training the Network

Recognizing Live Images of Digits

Program Explanation

Algorithm Steps

1. Read the image

2.Convert to gray scale

3. Scale image range

4. Thresholding

5. Resize image

6. Invert image

7. Feed into trained neural network

8. Print answer

Fin!

Schematics

PiCamera AI Digit Recogizer

Code

PiCameraDigit_AI_Recogizer.py

Credits

Dimiter Kendri

Comments

Related channels and tags