Published January 2, 2018 © GPL3+

The Raspbinator

A Terminator-inspired, Raspberry Pi-driven facial recognition talking robot head.

IntermediateFull instructions provided10 hours4,376

Things used in this project

Hardware components

Raspberry Pi 3 Model B

Adafruit PWM HAT

battery box

speaker

batteries

usb sound card

pan and tilt kit

Raspberry Pi Camera Module

camera cable

microphone

3.5mm jack splitter

Software apps and online services

wit.ai

Hand tools and fabrication machines

Soldering iron (generic)

Story

Key goals:

Have a robotic skull that is able to move its ‘eye’

Have it recognize / record faces and assign them to people

Be able to recognize speech and talk back

I need to know how Skynet gets built.

Most of you reading this have probably seen Terminator 1 + 2. If you haven’t, stop reading this right now and go watch them.

Welcome back.

I’ve loved these films for as long as I can remember, there was even the summer of 2001 where I think I watched Terminator 2 almost every day throughout the holidays.

Happy memories.

I always thought how cool it would be to make something like a Terminator myself, but never really had the skill or the tools available to do it – now, fortunately, in today’s world there is much more available to the consumer and I’m slightly smarter than I was when I was 11. After looking over some of the available tech and working on other projects to get my knowledge up I finally got to work on the Raspbinator in 2015, a few more projects and some procrastination later and its finally here; in its first incarnation.

Please see my prior post for details on the earlier phase of the project.

My CPU is a neural net processor.

Here are the items I’ve used for the project:

Raspberry Pi 3

PWM Servo HAT

Battery Box

Speaker

AA Batteries

Creative Sound Blaster Play! 2 USB Sound Card

Pan & Tilt Kit

Raspberry Pi Camera (1.3)

Camera cable

Microphone

3.5mm jack splitter

I insist.

I started work on this a while back so I’m using Raspbian Jessie, but the same stuff below will probably work with Stretch – so get on PiBakery and use this to start if you want to try and replicate this.

The main delay for this project was getting the talking / learning ability of it working, I’ve tried working on a chatbot before but it was difficult and didn’t work… too well. The past couple of months or so though I’ve been working hard on getting a decent-ish chatbot working that can receive inputs from someone and say things back.

I finally got the logic of the system down and it goes something like this loop (simplified):

Bot says initial “Hello”.

Human responds.

Bot stores the response to “Hello” and searches its database for anything its said before that closely matches what the human input was, then brings up a result from a prior interaction.

Loop back to 2.

By storing human responses to the bots Mongo database and assigning them to things the bot has previously said, then comparing inputs from the person to those items to find appropriate responses, you can get some reasonably decent responses from the bot.

As an example; if the bot says “whats the weather like” and I type in “its raining outside” it will store that response and tie it to that input. Now if someone else comes along and types “hows the weather”, it will search its database for close matches and find the previous response “whats the weather like”, at which point it will search for responses to that and find my response “its raining outside”. So while its not really ‘thinking’ about its responses it does end up coming back with some reasonable replies.

Now what happens if there are no prior responses for the input? Before the bot responds it stores the input its received into the database, its also splitting up every input and storing all the individual words – so when it can’t find a previous reply to your input it will search the database with zero accuracy, essentially picking a reply at random – sometimes repeating what you just said.

If it has more than 20 words stored in its database however, it will generate a random sentence from those words and reply with that. Now you may be thinking that this causes the bot to talk a lot of nonsense – you’d be right, but what happens is at first it will just repeat what you are saying as you talk to it. But the more you input in and reply the more it learns and the random sentences it generates can sometimes actually make some degree of sense; and when you reply to this it then has a reference point for when it receives an input similar to what it just said.

Here’s another example:

If I say to the bot: “I like cheese” and it has nothing in its database for this input and enough words to generate a random sentence, as essentially a guess it could come back with: “Hello television like usually”. Which of course doesn’t make sense, but if I then respond with “Yes I like television too” it stores that reply. Now, say, someone else comes along and types in “I usually watch television” it will run that through the database and find its similar to what it said before (“Hello television like usually”) and find my response (“Yes I like television too”), giving the illusion of a real response.

It’s essentially learning from the beginning, it knows nothing so it will try its best to use prior experience and as a last resort, guess – until it learns more so it doesn’t have to guess any more.

So with this in place and giving somewhat decent and consistent results I went ahead with trying to hook this up into an STT (Speech to Text) and a TTS (Text to Speech) system so that I could actually talk, rather than type and hear, rather than read the bots responses.

First off I needed to get the head to be able to talk and listen, previously I had toyed with using Jasper with PocketSphinx as the speech to text engine. This turned out to be less accurate than I’d hoped for an offline solution – so I switched it up to wit.ai and this turned out to be far better in testing.

I then realised that using Jasper was probably a bit overkill and I could hook into the wit.ai API without using Jasper at all. At this point the bot being able to listen and talk was working.

Also as a note some pre-requisites for running the chatbot:

MongoDB

Pymongo (will need to use the version 3.4.0)

espeak

sox

wit.ai

Of course, I’m a Terminator.

Next up is sorting out the ‘eye’ of the Raspbinator, for this I’m using the 16 Channel 12-bit PWM HAT from Adafruit as listed in the components above. See Adafruits handy guide for setting up servos and the HAT itself.

Once I had the hang of getting the servos to move about I wanted it to do a ‘search pattern’ whereby the pan and tilt module would move the camera in a down, left, up right kind of pattern so it can basically search all around – like an eyeball looking around everywhere it can.

Thanks to the simple test given on Adafruits library for using the HAT I was able to learn and import/change this code to suit my needs.

After some testing its time to sort out facial recognition with OpenCV.

I see everything.

The first tutorial I used was this to install opencv and this to get some basic facial recognition – with these two I was able to get some basic recognition of faces, as seen in the tutorial opencv can be used to identify the existence of a human face.

Further from this I wanted to expand the functionality to recognise people from a set of faces, this is where I bring in this tutorial to be able to use a trainer to find the faces of a set and assign them to id’s, so that I could have my bot see someone, identify their face and then compare this against a pre-existing set so that it can then know who they are.

Trust me.

At this point it was all just a case of putting the above 3 functions together and then chucking it all into a neat skull-like package, as seen in the photo above. I used a skull-shaped tealight holder, its not listed above as I got it from some random shop and I cannot find it from a quick Googling about.

Here’s a video showing the skull and the project in action

It works – for the most part, the video shows a number of times it actually responded well and recognised what it was meant too; but there were some issues with it, as seen in the outtakes near the end of the video.

It will pick up all 3 of the faces it has stored, I’ve tested with my own face and photos of Sarah and John from T2 and 7/10 times it will identify the face correctly, as seen in the video. Faces it doesn’t know, however…

I found that the bot will often return a high confidence result even if shown a totally new face, resulting in it identifying any new face as one it already knows – rather than returning as an unknown face. For instance one time, using a photo of Chris Pine on it; it thought it was me.

I wish.

The voice accuracy of wit.ai was also very impressive, most of the time getting the words I had said to it correct – resulting in some very eerie moments when the bot would return something like a genuine response to my speech inputs – as you can see in the video above at 1:50 minutes in.

Hasta la vista, baby.

So that is Phase I of the project; I’m happy with how it works, despite the chatbot implementation being a bit random at times and I’m very happy with wit.ai – its very accurate and fast as well as being a piece of cake to use; the obvious caveat being the skull needs to be online at all times.

Things to improve/add for Phase II:

Clean up the code a bit (any suggestions welcome)

Get a better fake skull that can fit more inside

Make the skull respond with less random responses when it has returned nothing from the database; by adding in a smarter function to construct sentences somehow – I’ll be doing R&D on this asap

Add functionality for the skull to identify new faces and then ask them for their names and append a photo of them to the training folder, so they can be identified in the future

No fate but what we make.

I invite you to try the code yourself and improve upon it, build your own and see what you can get it to do. I’m certainly excited to expand upon this project and really get it doing some cool things.

Let me know what you create and what you think of my project, its taken a long while to finally get it working – even if it is rather basic and error prone at this point.

And so the unknown future rolls toward us. I face it, for the first time, with a sense of hope.

Because if a machine, a Raspberry Pi, can learn the value of human life…

Maybe we can too.

Happy 2018

#some code/annotations from https://github.com/adafruit/Adafruit_Python_PCA9685/blob/master/examples/simpletest.py 
#and https://thecodacus.com/face-recognition-opencv-train-recognizer/

# import packages
from __future__ import division
import sys
import os
sys.path.append('/usr/local/lib/python2.7/site-packages')
from picamera.array import PiRGBArray
from picamera import PiCamera
import time
import cv2
import numpy as np
from PIL import Image
import time
import bot_9_import as b9

# Import the PCA9685 module.
import Adafruit_PCA9685

# Initialise the PCA9685 using the default address (0x40).
pwm = Adafruit_PCA9685.PCA9685()

#set the max/min servo positions
servo_min_v = 420
servo_max_v = 390
servo_min_h = 355
servo_max_h = 400

#set the scaleup/scaledown dividers (used later)
scaledown = 0.2
scaleup = 1/scaledown

# Create the haar cascade
cascadePath = "haarcascade_frontalface_alt.xml"
faceCascade = cv2.CascadeClassifier(cascadePath);
recognizer = cv2.face.createLBPHFaceRecognizer()

# set font
font = cv2.FONT_HERSHEY_PLAIN

def get_images_and_labels(path):
	# Append all the absolute image paths in a list image_paths
	# Will not read .sad extensions - will only use to test accuracy
	image_paths = [os.path.join(path, f) for f in os.listdir(path) if not f.endswith('.sad')]
	# images will contain faces
	images = []
	# labels will contain the label assigned to images
	labels = []
	for image_path in image_paths:
		# Read image and convert to grayscale
		image_pil = Image.open(image_path).convert('L')
		
		# Convert the image into numpy array
		image = np.array(image_pil, 'uint8')
		
		# Get the label
		nbr = int(os.path.split(image_path)[1].split(".")[0].replace("subject", ""))
		# Detect the face in image
		faces = faceCascade.detectMultiScale(image)
		# If face detected, append to images and label to labels
		for (x, y, w, h) in faces:
			images.append(image[y: y + h, x: x + w])
			labels.append(nbr)
			#cv2.imshow("Adding faces to training set", image[y: y + h, x: x + w])
			#cv2.waitKey(1000)
			
	# return the images list and labels list
	return images, labels

# Helper function to make setting a servo pulse width simpler.
def set_servo_pulse(channel, pulse):
    pulse_length = 1000000    # 1,000,000 us per second
    pulse_length //= 60       # 60 Hz
    print('{0}us per period'.format(pulse_length))
    pulse_length //= 4096     # 12 bits of resolution
    print('{0}us per bit'.format(pulse_length))
    pulse *= 1000
    pulse //= pulse_length
    pwm.set_pwm(channel, 0, pulse)
    
def face_detect():

	for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
		# grab the raw NumPy array representing the image, then initialize timestamp and occupied/unoccupied text
		image = frame.array
		
		#switch the image to grayscale for quicker processing	
		gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
		
		#scale the image down for further quicker processing (but less accurate)		
		small_img = cv2.resize(gray, (0,0), fx=scaledown, fy=scaledown, interpolation=cv2.INTER_LINEAR)
		
		#equalize the histogram of the grayscale image
		cv2.equalizeHist(small_img, small_img)
		
		# Run the cascade face detection
		faces = faceCascade.detectMultiScale(small_img)	
			
		if len(faces) > 0:		
			for (x, y, w, h) in faces:
				#if faces are detected from the face detection above run the prediction function
				faceid, conf = recognizer.predict(small_img[y: y + h, x: x + w])
				#print the faceid int as well as the confidence level
				print faceid, conf
				#if the confidence level is over 200 identify whos face it is from the trained images
				if conf > 180:
					if faceid == int(1):
						humanid = 'John Connor'
					if faceid == int(2):
						humanid = 'Sarah Connor'
					if faceid == int(3):
						humanid = 'Mike'
				#if the confidence level is below 200 then return 'identify yourself' for the TTS
				else:
					humanid = 'Identify yourself'
		#if no faces detected delete the raw capture from the camera and break the loop to return to the eye movement loop
		else:
			rawCapture.truncate(0)
			break
		#print the humanid for debug purposes
		print humanid
		#call the conversation class from the chatbot and pass along the humanid
		b9.openConversation(humanid)
		#delete the raw capture from the camera
		rawCapture.truncate(0)
		#wait two seconds
		time.sleep(2)

# Set frequency to 60hz, good for servos.
pwm.set_pwm_freq(60)

# Path to the Yale Dataset
path = 'testfaces'
# The folder is the same path as the script
# Call get_images_and_labels function and get the images and the
# corresponding labels
images, labels = get_images_and_labels(path)

# Perform the training
recognizer.train(images, np.array(labels))

# initialize camera and grab a reference raw image
camera = PiCamera()
camera.resolution = (320, 240)
camera.framerate = 32
rawCapture = PiRGBArray(camera, size=(320, 240))

# allow camera to warmup
time.sleep(0.1)

#this loop moves the camera in a square motion to look around, running a face detection before each movement
while True:
    # Move servo on channel O between extremes.
    face_detect()
    pwm.set_pwm(0, 0, servo_min_v)
    time.sleep(2)
    face_detect()
    pwm.set_pwm(0, 0, servo_max_h)
    time.sleep(2)
    face_detect()
    pwm.set_pwm(1, 0, servo_max_v)
    time.sleep(2)
    face_detect()
    pwm.set_pwm(1, 0, servo_min_h)
    face_detect()
    time.sleep(2)
    
# Close windows
cv2.destroyAllWindows()

#!/usr/bin/

#imports
import sys
import subprocess
import time
import random
import pymongo
import datetime
import sys
import time
import numpy
from colors import *
from pymongo import MongoClient
from pprint import pprint
from difflib import SequenceMatcher
from wit import Wit

#main class where all the workings happen
class talkLoop(object):
	
	#initialise the class with all variables required
	def __init__(self, client, db, responses, allwords, inputwords, globalReply, botAccuracy, botAccuracyLower):
		self.client = client
		self.db = db
		self.responses = responses
		self.allwords = allwords
		self.inputwords = inputwords
		self.globalReply = globalReply
		self.botAccuracy = botAccuracy
		self.botAccuracyLower = botAccuracyLower
	
	#function for comparing string similarity
	def similar(self, a, b):
		return SequenceMatcher(None, a, b).ratio()
	
	#function for grabbing a random document from the database
	def get_random_doc(self):
		count = self.allwords.count()
		return self.allwords.find()[random.randrange(count)]
	
	#this function generates a random sentence at any length between 1 and 10 words long
	def sentenceGen(self):
		#set a clear string and set a random integer 1-10
		result = ""
		length = random.randint(1, 10)
		
		#for the range in the integer above find a random word from the db and append to the string
		for i in range(length):
			cursor = self.get_random_doc()
			for x, y in cursor.items():
				if x == "word":
					cWord = (y)
					result += cWord
					result += ' '
					#clear the cursor
					del cursor
		#return the constructed sentence
		return result
	
	#this function searches the database for the input string and returns all replies for that string, returning a random one
	def dbSearch(self, searchIn):
		#search the database for inputs the bot has said prior
		cursor = self.responses.find_one({"whatbotsaid": searchIn})
		#return list of human replies to this response and choose one at random
		for x, y in cursor.items():
			if x == 'humanReply':
				chosenReply = (random.choice(y))
		#erase the cursor and return the chosen string
		del cursor
		return chosenReply
	
	#the string comparison function
	def mongoFuzzyMatch(self, inputString, searchZone, termZone, setting):
		#create an empty dictionary
		compareList = {}
		#search the database passed in
		for cursor in searchZone.find():
			for x, y in cursor.items():
				#find the item in the cursor that matches the search term passed into the function, eg: 'whatbotsaid'				
				if x == termZone:
					#compare the input string to the current string in the cursor, which returns a decimal point of accuracy (0.0 > 1.0)
					compareNo = self.similar(inputString, y)
					#if accuracy is off then append the string and its accuracy to the dictionary no matter the accuracy
					if setting == ('off'):
						compareList[y] = compareNo
					#if accuracy is medium then append the string and its accuracy to the dictionary only if its over the medium setting
					elif setting == ('med'):
						if compareNo > self.botAccuracyLower:
							compareList[y] = compareNo
					#if accuracy is on/high then append the string and its accuracy to the dictionary only if its over the on/high setting
					elif setting == ('on'):
						if compareNo > self.botAccuracy:
							compareList[y] = compareNo
		#if nothing found then return a non match
		if compareList == {}:
			compareChosen = 'none_match'
		#if there are matching strings identify the highest accuracy from the dictionary made above		
		else:
			compareChosen = max(compareList.iterkeys(), key=(lambda key: compareList[key]))
		#erase the cursor and return the chosen matching string
		del cursor
		return compareChosen
	
	
	def replyTumbler(self):
		#find the search string using the high accuracy number - to find a decent match to what the bot has said prior
		#when this function is called it required four arguments: the human response, the database to search on, the response required from the database and the accuracy level
		searchSaid = self.mongoFuzzyMatch(self.wordsIn, self.responses, 'whatbotsaid', 'on')
		#if no matches then try with a lower accuracy to find a less similar sentence
		if searchSaid == ('none_match'):
			searchSaid = self.mongoFuzzyMatch(self.wordsIn, self.responses, 'whatbotsaid', 'med')
			#if still no match then move onto generating a totally random reply either from words in the database (if there are over twenty stored)
			#and if under twenty words stored run the search function with zero minimum accuracy to essentially return a random sentence the bot has said prior
			if searchSaid == ('none_match'):
				if int(self.allwords.count()) < 20:
					searchSaid = self.mongoFuzzyMatch(self.wordsIn, self.responses, 'whatbotsaid', 'off')
					#pass the response into the database to find prior human responses to the above sentence
					chosenReply = self.dbSearch(searchSaid)			
				else:	
					chosenReply = self.sentenceGen()
			else:
				#pass the response into the database to find prior human responses to the above sentence
				chosenReply = self.dbSearch(searchSaid)
		else:
			#pass the response into the database to find prior human responses to the above sentence		
			chosenReply = self.dbSearch(searchSaid)
		#clear the search variable
		del searchSaid
		return (chosenReply)

	#this function passes in the information from the loop, the input reply and the bots last reply and appends them to the database	
	def updateDB(self, wordsIn, bResponse):
		self.wordsIn = wordsIn
		self.bResponse = bResponse
		
		#search the database for prior responses the bot has said
		cursor = self.responses.find_one({"whatbotsaid": self.bResponse})
		#if none then store a new bot response with the humans reply
		if cursor is None:
			postR = {"whatbotsaid": self.bResponse, "humanReply": [self.wordsIn]}
			self.responses.insert_one(postR).inserted_id
			del cursor
		#if already existing then update the database with a new reply
		else:
			self.responses.update({"whatbotsaid": self.bResponse}, {'$addToSet':{"humanReply": self.wordsIn}}, upsert=True)
			#clear the cursor
			del cursor
			
		#split the input sentence into individual words and store each in the database
		wordsInDB = self.wordsIn.split(' ')
		for word in wordsInDB:
			#search the database for the word
			cursor = self.allwords.find_one({"word": word})
			#if its not already in the database then insert into the database
			if cursor is None:
				postW = {"word": word}
				self.allwords.insert_one(postW).inserted_id
			#if the word is already in the database pass and clear the cursor
			else:
				pass
			del cursor

#the function called from the main code that will run the main class with all the necessary parameters and start the loop
def openConversation(personName):
	
	#the wit.ai API key (this is a fake one you will need to sign up for your own at wit.ai)
	client_wit = Wit('YOURKEYHERE')

	#setting up variables for mongodb
	client = MongoClient('localhost', 27017)
	db = client.words_database
	responses = db.responses
	allwords = db.allwords

	#variables for first input and the 2 levels of search accuracy
	inputWords = ("hello")
	globalReply = ("hello")
	botAccuracy = 0.725
	botAccuracyLower = 0.45
	
	#initialise the main class and get a basic first response from the bot
	talkClass = talkLoop(client, db, responses, allwords, inputWords, globalReply, botAccuracy, botAccuracyLower)
	#pass the starting inputs to the database for storage
	talkClass.updateDB(inputWords, globalReply)
	#the below three lines push the input words into the reply tumbler in order to find another greeting other than just human responses to 'hello'
	#for instance: 'hello' can return 'greeting' which will return human responses to that such as 'good day' instead of just returning 'greeting'
	inputWords = (talkClass.replyTumbler())
	talkClass.updateDB(inputWords, globalReply)
	globalReply = (talkClass.replyTumbler())
	#combine the greeting with the humans name from the face idenfication code
	globalReply = str(globalReply + " " + personName)
	#use subprocess again to initialise espeak (the TTS) and say the bots response
	subprocess.call(['espeak', globalReply])
	#print the output words to the screen (debug/testing purposes)
	sys.stdout.write(BLUE)
	print (globalReply)
	sys.stdout.write(RESET)

	#the main loop wrapped in a try to capture any errors and hopefully exit cleanly
	try:
		while True:
			#using subprocess to call the sox recording software with a configuration to trim silence from the recording and stop recording when the speaker has finished
			subprocess.call(['rec test.wav rate 32k silence 1 0.1 5% 1 1.0 5%'], shell=True)
			resp = None
			#use the wit.ai class to interface with the API and send off the wav file from above for STT functions
			with open('test.wav', 'rb') as f:
			  resp = client_wit.speech(f, None, {'Content-Type': 'audio/wav'})
			#parse the response given to get the text sent back which will then become the words the bot uses
			inputWords = str(resp['_text'])
			#if the word(s) goodbye/good bye are said then break the loop which will return to the main code and resume the skull to look around for another human face
			if inputWords == "goodbye":
				break
			if inputWords == "good bye":
				break
			#print the input words to the screen (debug/testing purposes)
			sys.stdout.write(RED)
			print inputWords
			sys.stdout.write(RESET)
			#update the database with the humans response and the bots last response
			talkClass.updateDB(inputWords, globalReply)
			#call the reply tumbler function for the bots reply
			globalReply = (talkClass.replyTumbler())
			#use subprocess again to initialise espeak (the TTS) and say the bots response
			subprocess.call(['espeak', globalReply])
			#print the output words to the screen (debug/testing purposes)
			sys.stdout.write(BLUE)
			print(globalReply)
			sys.stdout.write(RESET)
	except: 
	  pass

Credits

Michael Darby - 314Reactor

55 projects • 143 followers

I like to keep fit, explore and of course make projects.

The Raspbinator

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

I need to know how Skynet gets built.

My CPU is a neural net processor.

I insist.

Of course, I’m a Terminator.

I see everything.

Trust me.

Hasta la vista, baby.

No fate but what we make.

Code

Main Bot code

The Chatbot code

colors.py

Github file

Credits

Michael Darby - 314Reactor

Comments

Embed the widget on your own site

The Raspbinator

The Raspbinator

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

I need to know how Skynet gets built.

My CPU is a neural net processor.

I insist.

Of course, I’m a Terminator.

I see everything.

Trust me.

Hasta la vista, baby.

No fate but what we make.

Code

Main Bot code

The Chatbot code

colors.py

Github file

Credits

Michael Darby - 314Reactor

Comments

Related channels and tags