David Packman
Published © CC BY

BMO-AI: The companion robot with artificial intelligence

An artificial intelligence powered portable robot companion based on the character from Cartoon Network's Adventure Time.

AdvancedFull instructions providedOver 1 day5,958

Things used in this project

Hardware components

Raspberry Pi 3 Model B
Raspberry Pi 3 Model B
While these instructions use a Raspberry Pi 3B specifically, it is possible to build this with other single board computers that may require additional steps, hardware, and thermal management considerations.
×1
Raspberry Pi Camera Module 3
Raspberry Pi Camera Module 3
×1
Adafruit Raspberry Pi CRICKIT Hat
This can be swapped out for a servo driver if you're using a different SBC, but code changes will be required and this is very handy for managing both the servomotors and the tactile button inputs.
×1
Adafruit Mono 2.5w Audio Amp
×1
Adafruit mini oval 8ohm 1w speaker
×2
Adafruit mini 2-wire voltmeter
×1
Adafruit Bakelite perfboard plates
You can use regular perforated through-hole protoboards but will need to use a Dremel to cut them into shape instead of scissors with the bakelite.
×1
Adafruit 6mm tactile buttons
×1
Adafruit Mini USB microphone
×1
Jauch Protected 18650 3350mAh LION battery
×2
CWI332-ND SPST slide switch
×1
F17HA-05MC 5V 17mm Fan
×1
FeeTech FS90 Micro Servomotor
×4
D36V50F5 5v 5.5a Voltage Regulator
×1
Wireless USB Keyboard
×1
Raspberry Pi cable kit
You'll at least want a longer DSI ribbon cable than what usually comes with 5 inch DSI displays. CSI and DSI MIPI cables are fairly interchangeable.
×1
DC Power Right Angle Male Barrel Jack
×1
9 inch mini USB with USB right and left angle
×1
18650 Battery Holder with leads
×1
3.5 Mono male audio jack with screw terminator
×1
Waveshare 5 inch DSI LCD
Other brands of DSI LCD monitors can be used, but modifications may need to be made to the mounting brackets to accommodate differences.
×1
Various M2, M2.5 and M3 screws, nuts, heatset inserts, and offsets
×1

Software apps and online services

Raspbian
Raspberry Pi Raspbian
Any Debian distro could also work. However, Ubuntu doesn't work with some of the API libraries in this build.
Microsoft Azure Cognitive Speech Services API
Microsoft Azure Computer Vision
OpenAI ChatGPT 3.5 Turbo
OpenAI DALL-E 2

Hand tools and fabrication machines

3D Printer (generic)
3D Printer (generic)
I recommend using PETG or another type of plastic filament that has a higher resistance to thermal deformation than PLA. Also, Teal is highly recommended.
3D Printing Pen (Generic)
Note: You can also use a soldering iron set to 230c to weld plastic pieces together where a 3D pen is mentioned.
Dupont and JST crimping kit
Hot glue gun (generic)
Hot glue gun (generic)
Soldering iron (generic)
Soldering iron (generic)
Drill, Screwdriver
Drill, Screwdriver
acrylic paint and paintbrush (Generic)

Story

Read more

Schematics

Wake Word Table File

Download and rename to "final_midfa.table", then move this to the same directory as the Python file to use the "Hey BMO" wakeword.

Face Images

These are the facial expressions for BMO, extract to the Pictures directory under you Home directory.

Code

bmo_sample.py

Python
You will need to edit the following lines:
45-60: Update for your face image directory paths if you don't save them to /home/bmo/Pictures/
135-165: Update for your servomotor ranges
194-195: Only change if you create a different wake word.
223-225: Update with your API services details (region and endpoint)
230: Only update if you choose to use a different voice.
295: Only if you want to use a directory other than /home/bmo/Photos/
319: Replace with the email account to send photos from
320: Replace with recipient email address.
325: Replace with sender's smtp server address
331: Replace if your SMTP server uses a different port
334: Replace with sender's email password or app password
357: Only if you want to use a directory other than /home/bmo/Photos/
394: Only if you want to use a directory other than /home/bmo/Photos/
489: Only if you want to use a directory other than /home/bmo/Photos/
538: Only if you want to use a directory other than /home/bmo/Photos/
553: Only if you want to use a directory other than /home/bmo/Photos/
import os
import sys
import io
import base64
import pygame
import time
import smtplib
import imghdr
import tiktoken
import warnings
import requests
import json
import azure.cognitiveservices.speech as speechsdk
from openai import OpenAI
from email.message import EmailMessage
from libcamera import Transform
from picamera2 import Picamera2
from adafruit_crickit import crickit
from urllib.request import urlopen
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
from array import array
from PIL import Image

#PYGAME STUFF
global display_width
display_width = 800
global display_height
display_height= 480
global gameDisplay
gameDisplay = pygame.display.set_mode((display_width,display_height))
global black
black = (0,0,0)
global teal
teal = (128,230,209)
global clock
clock = pygame.time.Clock()
global crashed
crashed = False

#Assign face images, may vary depending on your choices and the paths to your images.
#Update these so that the paths are pointing to the directory where you put your face images.
global BMO1
BMO1 = pygame.image.load('/home/bmo/Pictures/bmo1.jpg')
global BMO2
BMO2 = pygame.image.load('/home/bmo/Pictures/bmo11.jpg')
global BMO3
BMO3 = pygame.image.load('/home/bmo/Pictures/bmo7.jpg')
global BMO4
BMO4 = pygame.image.load('/home/bmo/Pictures/bmo2.jpg')
global BMO5
BMO5 = pygame.image.load('/home/bmo/Pictures/bmo3.jpg')
global BMO6
BMO6 = pygame.image.load('/home/bmo/Pictures/bmo8.jpg')
global BMO7
BMO7 = pygame.image.load('/home/bmo/Pictures/bmo16.jpg')
global BMO8
BMO8 = pygame.image.load('/home/bmo/Pictures/bmo15.jpg').convert() #the convert at the end of this one lets you display text if you choose to.

#Define backgrounds for display blit
def bmo_rest(x,y):
    gameDisplay.blit(BMO1, (x,y))

def bmo_sip(x,y):
    gameDisplay.blit(BMO2, (x,y))

def bmo_slant(x,y):
    gameDisplay.blit(BMO3, (x,y))

def bmo_talk(x,y):
    gameDisplay.blit(BMO4, (x,y))

def bmo_wtalk(x,y):
    gameDisplay.blit(BMO5, (x,y))

def bmo_smile(x,y):
    gameDisplay.blit(BMO6, (x,y))

def bmo_squint(x,y):
    gameDisplay.blit(BMO7, (x,y))

def bmo_side(x,y):
    gameDisplay.blit(BMO8, (x,y))

global x
x = (display_width * 0)
global y
y = (display_height * 0)

#BUTTON STUFF
global ss
ss = crickit.seesaw

#Assign signal pins to buttons, these may vary depending on how they are connected in your build.
#If you don't want to use them, leave these commented out.
#If you want to assign buttons to actions, you'll need to do that in the code.

#button_up = crickit.SIGNAL1
#button_rt = crickit.SIGNAL2
#button_dn = crickit.SIGNAL3
#button_lt = crickit.SIGNAL4
#button_bu = crickit.SIGNAL5
#button_bk = crickit.SIGNAL6
#button_gn = crickit.SIGNAL7
#button_rd = crickit.SIGNAL8

#Set signal pins for button pullup input. 
#Note, you should test your inputs to make sure they are assigned correctly.
#ss.pin_mode(button_up, ss.INPUT_PULLUP)
#ss.pin_mode(button_rt, ss.INPUT_PULLUP)
#ss.pin_mode(button_dn, ss.INPUT_PULLUP)
#ss.pin_mode(button_lt, ss.INPUT_PULLUP)
#ss.pin_mode(button_bu, ss.INPUT_PULLUP)
#ss.pin_mode(button_bk, ss.INPUT_PULLUP)
#ss.pin_mode(button_gn, ss.INPUT_PULLUP)
#ss.pin_mode(button_rd, ss.INPUT_PULLUP)

#SERVO STUFF
#servo pin assignments, these may vary if you connected servos in a different order, make sure to test.
global LA
LA = crickit.servo_1
global RA
RA = crickit.servo_2
global LL
LL = crickit.servo_3
global RL
RL = crickit.servo_4

#Note, the ranges will vary depending on your servos and how you preset their positions before installing.
#Make sure to position the servos before installing so the appendages have the right range of motion.
#You'll likely need to adjust these for your servos.

#Right Arm ranges
global RAU      #right arm up
RAU = 180       
global RAUP     #right arm up partially
RAUP = 130
global RADP     #right arm down partially
RADP = 60
global RAD      #right arm down
RAD = 10

#Left Arm ranges
global LAU
LAU = 10
global LAUP
LAUP = 50
global LADP
LADP = 120
global LAD
LAD = 160

#Right Leg ranges
global RLU
RLU = 130
global RLD
RLD = 65

#Left Leg ranges
global LLU
LLU = 55
global LLD
LLD = 120

#defining "smove", a smooth servo movement routine
def smove(sname,start,end,delta):	#move from start to end incrementally using delta in seconds
    incMove=(end-start)/10.0
    incTime=delta/10.0
    for x in range(10):
        sname.angle = (int(start+x*incMove))
        time.sleep(incTime)
# smove sample, to move Right Arm from Down to Up: smove(RA,RAD,RAU,0.20)

#Initialize the camera now since you can only do it one time, might as well do it now.
picam2 = Picamera2()

#Define the keyphrase detection function
def get_keyword():
    #First, do some stuff to signal ready for keyword
    gameDisplay.fill(teal)
    bmo_sip(x,y)
    pygame.display.update()
    smove(RA,RAD,RAUP,0.25)
    time.sleep(.8)
    bmo_rest(x,y)
    pygame.display.update()
    smove(RA,RAUP,RAD,0.25)

    #Create instance of kw recog model.
    #Update this to point at location of your local kw table file if not in same dir
    #Also, update the file name if you created a different keyphrase and model in Azure Sound Studio
    model = speechsdk.KeywordRecognitionModel("final_midfa.table")
    keyword = "Hey be Moe"  #If you created a different keyphrase to activate the listening function, you'll want to change this.
    keyword_recognizer = speechsdk.KeywordRecognizer()
    done = False

    def recognized_cb(evt):
        #This function checks for recognized keyword and kicks off the response to the keyword.
        result = evt.result
        if result.reason == speechsdk.ResultReason.RecognizedKeyword:
            bmo_smile(x,y)
            pygame.display.update()
            print("Recognized Keyphrase: {}".format(result.text))
        nonlocal done
        done = True

    #Once a keyphase is detected...
    keyword_recognizer.recognized.connect(recognized_cb)
    result_future = keyword_recognizer.recognize_once_async(model)
    print('Ready'.format(keyword))
    result = result_future.get()

    if result.reason == speechsdk.ResultReason.RecognizedKeyword:
        Respond_To_KW() #move on the the respond function

#Define actions after KW recognized
def Respond_To_KW():
    #Set all your keys and cognitive services settings here
    client = OpenAI()
    speech_key = os.environ.get("speech_key")  
    speech_region = "xxxxxxx"       #Copy you Azure Speech Region here
    vision_key = os.environ.get("vision_key")
    vision_endpoint = "https://xxxxxx.cognitiveservices.azure.com/"     #Copy your Azure Vision endpoint url here

    #setup for the speech and vision services
    computervision_client = ComputerVisionClient(vision_endpoint, CognitiveServicesCredentials(vision_key))
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
    speech_config.speech_synthesis_voice_name = "en-US-JaneNeural"      #If you decide to use a different voice, update this.
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

    #Defining a few movement patterns and setting threads
    def waving():
        smove(LA,LAD,LAU,0.20)
        time.sleep(.4)
        smove(LA,LAU,LAUP,0.20)
        time.sleep(.3)
        smove(LA,LAUP,LAU,0.20)
        time.sleep(.4)
        smove(LA,LAU,LAD,0.20)

    def kicking():
        smove(LL,LLD,LLU,0.17)
        time.sleep(.3)
        smove(LL,LLU,LLD,0.17)
        smove(RL,RLD,RLU,0.17)
        time.sleep(.3)
        smove(RL,RLU,RLD,0.17)
        smove(LL,LLD,LLU,0.17)
        time.sleep(.3)
        smove(LL,LLU,LLD,0.17)
        time.sleep(.2)
        
    #function for encoding images for resize
    def encode_image(aimage_name):
      with open(aimage_name, "rb") as exp_image_file:
        return base64.b64encode(exp_image_file.read()).decode('utf-8')

    #Response after keyword
    text_resp = "What's up?"    #This is text for TTS
    bmo_talk(x,y)               #Face to display and where to start
    pygame.display.update()     #Update screen to show change
    result = speech_synthesizer.speak_text_async(text_resp).get() #Run the TTS on text_resp
    waving()                    #wave motion

    #After done talking
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:      #this makes it wait for TTS to finish
        bmo_smile(x,y)
        pygame.display.update()
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)      #config STT
        speech_recognition_result = speech_recognizer.recognize_once_async().get()      #listen for voice input
        
        #Convert to text once speech is recognized
        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:     #wait for voice input to complete
            print("Recognized: {}".format(speech_recognition_result.text))      #Print was was heard in the console (for testing)
            bmo_slant(x,y)
            pygame.display.update()

            # if and elif checks for special commands
            #First, check for taking photo intent
            if "photo" in speech_recognition_result.text:         #If the word "picture" is in what the user said, we'll take a picture.
                text_resp = "Ok, hold on a second and I'll take a picture."     #Update the text_resp variable to say this next.
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()       #say it
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:      #wait until done saying it
                    bmo_squint(x,y)
                    pygame.display.update()
                    smove(LA,LAD,LAUP,0.20)
                    #Configure the camera here
                    capture_config = picam2.create_still_configuration(main={"size": (1920, 1080)}) #config for still image cap at 1920x1080
                    capture_config["transform"] = Transform(hflip=1,vflip=1)    #since camera is upside down, lets rotate 180 deg
                    image_name = "/home/bmo/Photos/img_" + str(time.time()) + ".jpg"   #Change the path if needed
                    picam2.start()      #now we can start the camera
                    time.sleep(2)       #wait two seconds for the camera to acclimate or you won't get a good photo
                    picam2.switch_mode_and_capture_file(capture_config, image_name)     #take the photo
                    time.sleep(1)       #wait a sec
                    smove(LA,LAUP,LAD,0.15)
                    BMOPIC = pygame.image.load(image_name)      #Configure pygame to show the photo
                    bmo_side(x,y)
                    pygame.display.update()     #shift to the small face
                    gameDisplay.blit(BMOPIC, (280, y))      #Get ready to blit the photo with a 280 offset on x axis
                    pygame.display.update()     #update screen to show foto overlay on the small face background
                    text_resp = "Would you like me to share this picture with you?"
                    picam2.stop_preview()       #probably a good idea to turn the camera off
                    picam2.stop()               #yep, probably a good idea
                    result = speech_synthesizer.speak_text_async(text_resp).get()
                    time.sleep(.1)
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                        if speech_recognition_result.text == 'Yes.':    #if user says anything that includes the word yes, send the photo
                         #Email attachment routine
                         bmo_smile(x,y)
                         pygame.display.update()
                         message = EmailMessage()
                         email_subject = "Image from BMO"
                         sender_email_address = "send@email.com"      #Insert your sending email address here
                         receiver_email_address = "recieve@email.com"    #Insert the receiving email address here
                         # configure email headers
                         message['Subject'] = email_subject
                         message['From'] = sender_email_address
                         message['To'] = receiver_email_address
                         email_smtp = "smtp.email.com"                      #Insert the sending email smtp server address here
                         
                         with open(image_name, 'rb') as file:
                              image_data = file.read()
                         message.set_content("Email from BMO with image attachment")
                         message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
                         server = smtplib.SMTP(email_smtp, '587')           #check your smtp server docs to make sure this is the right port
                         server.ehlo()
                         server.starttls()
                         server.login(sender_email_address, 'EmailPasswordHere')  #insert the sender's email password or app password here
                         server.send_message(message)
                         server.quit()
                         text_resp = "Ok, I sent it to you."
                         bmo_talk(x,y)
                         pygame.display.update()
                         result = speech_synthesizer.speak_text_async(text_resp).get()
                         if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                             bmo_rest(x,y)
                             pygame.display.update()
                get_keyword()

            # Check for OpenAI Vision Description intent
            elif "you looking at" in speech_recognition_result.text: #this starts of the image description function
                text_resp = "Oh, let me think about how to describe it."
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get() #talk
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done talking
                    bmo_squint(x,y)
                    pygame.display.update()
                    acapture_config = picam2.create_still_configuration(main={"size": (1920, 1080)})    #set cam resolution
                    acapture_config["transform"] = Transform(hflip=1,vflip=1)   #flip camera since it is upside down
                    aimage_name = "/home/bmo/Photos/aimg_" +str(time.time()) +".jpg"   #Change path if needed
                    picam2.start()  #start camera
                    time.sleep(2)   #wait to let the camera adjust
                    picam2.switch_mode_and_capture_file(acapture_config, aimage_name)   #switch mode to image capture and snap
                    text_resp = "Hmm."
                    smove(LA,LAD,LAUP,0.18)
                    result=speech_synthesizer.speak_text_async(text_resp).get() #say Hmm to break up the wait
                    base64_image = encode_image(aimage_name)
                    headers = {
                        "Content-Type": "application/json",
                        #"Authorization": f"Bearer {OPENAI_API_KEY}"
                    }
                    response = client.chat.completions.create(
                        model="gpt-4-vision-preview",
                        messages=[
                            {
                                "role": "user",
                                "content": [
                                    {
                                        "type": "text", 
                                        "text": "Describe this image that you saw."
                                        },
                                    {
                                        "type": "image_url",
                                        "image_url": {
                                            "url": f"data:image/jpeg;base64,{base64_image}"
                                        }
                                    },
                                ],
                            }
                        ],
                        max_tokens=300,
                    )
                    response_text = response.choices[0].message.content
                    print(response_text)
                    picam2.stop_preview()   #shut off camera
                    picam2.stop()
                    resized_img = "/home/bmo/Photos/desc_rs" +str(time.time()) +".png"
                    with Image.open (aimage_name) as im:
                        resized = im.resize((512,512))
                        resized.save(resized_img)
                    #text_resp = "That looks like " + (response_text)
                    image = pygame.image.load(resized_img)
                    bmo_side(x,y)
                    pygame.display.update()
                    gameDisplay.blit(image, (280,y))
                    pygame.display.update()
                    smove(LA,LAD,LAUP,0.18)
                    result=speech_synthesizer.speak_text_async(response_text).get()
                    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                        smove(LA,LAUP,LAD,0.20)
                        text_resp = "Is there anything else you want to know about what we were looking at?"
                        result=speech_synthesizer.speak_text_async(text_resp).get()
                        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                            speech_recognition_result = speech_recognizer.recognize_once_async().get()
                            if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                                if "No" in speech_recognition_result.text:
                                    text_resp = "Ok. Let me know if you need anything else."
                                    bmo_talk(x,y)
                                    pygame.display.update()
                                    result=speech_synthesizer.speak_text_async(text_resp).get()
                                    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                        get_keyword()
                                else:
                                    text_resp = "Ok. Let me think..."
                                    result=speech_synthesizer.speak_text_async(text_resp).get()
                                    response = client.chat.completions.create(
                                        model="gpt-4-vision-preview",
                                        messages=[
                                            {
                                                "role": "user",
                                                "content": [
                                                    {
                                                        "type": "text",
                                                        "text": speech_recognition_result.text
                                                    },
                                                    {
                                                        "type": "image_url",
                                                        "image_url": {
                                                            "url": f"data:image/jpeg;base64,{base64_image}"
                                                        }
                                                    },
                                                ],
                                            }
                                        ],
                                        max_tokens=300,
                                    )
                                    response_text = response.choices[0].message.content
                                    smove(LA,LAD,LAUP,0.18)
                                    result=speech_synthesizer.speak_text_async(response_text).get()
                                    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                        smove(LA,LAUP,LAD,0.20)
                                        text_resp = "Let me know if there's anything else I can help with."
                                        bmo_talk(x,y)
                                        pygame.display.update()
                                        result=speech_synthesizer.speak_text_async(text_resp).get()
                                        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                            get_keyword()

            # Check for Stable Diffusion painting intent
            elif "Draw Something" in speech_recognition_result.text:
                text_resp = "Sure, what would you like me to draw?"
                bmo_talk(x,y)
                pygame.display.update()
                smove(LA,LAD,LAUP,0.18)
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    bmo_smile(x,y)
                    pygame.display.update()
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                        print("Recognized: {}".format(speech_recognition_result.text))
                        obj_desc = speech_recognition_result.text    #assign description for image gen
                        text_resp = "Ok, give me a few seconds to draw this."
                        bmo_talk(x,y)
                        pygame.display.update()
                        smove(LA,LAUP,LAU,0.15)
                        smove(LA,LAU,LAUP,0.15)
                        result = speech_synthesizer.speak_text_async(text_resp).get()
                        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                            bmo_smile(x,y)
                            pygame.display.update()
                            response = client.images.generate(    #call to OpenAI to create image
                                model="dall-e-3",   #use dall-e-2 if you don't want to use DALL-e 3
                                prompt=obj_desc,
                                size="1024x1024",
                                quality="standard",
                                n=1,
                                )
                            image_url = response.data[0].url    #Get image from url
                            image_str = urlopen(image_url).read()    #Open image
                            image_file = io.BytesIO(image_str)     #convert
                            resized_img = "/home/bmo/Photos/draw_rs_" +str(time.time()) +".png"    #assign path for resized image
                            with Image.open (image_file) as im:    #Here we resize to fit screen
                                resized = im.resize((512,512))
                                resized.save(resized_img)   #and save it
                            image = pygame.image.load(resized_img)  #prep resized image for display
                            bmo_side(x,y)
                            pygame.display.update()
                            gameDisplay.blit(image, (280,y))
                            pygame.display.update()
                            text_resp = "I hope you like it."
                            result = speech_synthesizer.speak_text_async(text_resp).get()
                            time.sleep(5)
                            smove(LA,LAUP,LAD,.15)
                get_keyword()

            # Check for DALL-E Image Variation intent
            elif "you thinking" in speech_recognition_result.text:  #setup for the daydream mode
                text_resp = "Oh, I'm just daydreaming. Do you want me to show you what I was imagining?"
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done talking
                    bmo_smile(x,y)
                    pygame.display.update()
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()  #listen
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech: #when done listening
                        if "no" in speech_recognition_result.text:  #if response negative, then
                            text_resp = "Ok, let me know if you need anything else."
                            bmo_talk(x,y)
                            pygame.display.update()
                            smove(RA,RAD,RADP,0.12)
                            smove(LA,LAD,LADP,0.12)
                            result = speech_synthesizer.speak_text_async(text_resp).get()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_rest(x,y)
                                pygame.display.update()
                                smove(RA,RADP,RAD,0.12)
                                smove(LA,LADP,LAD,0.12)
                            get_keyword()
                        else:   #if result not negative, then
                            text_resp = "Ok, give me a few seconds to draw you a picture."  
                            bmo_talk(x,y)
                            pygame.display.update()
                            result = speech_synthesizer.speak_text_async(text_resp).get()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_slant(x,y)
                                pygame.display.update()
                                capture_config = picam2.create_still_configuration(main={"size": (512,512)})    #set up supported image resolution
                                capture_config["transform"] = Transform(hflip=1,vflip=1)    #flip image
                                image_name = "/home/bmo/Photos/img_" + str(time.time()) + ".png"    #change path if needed
                                picam2.start()  #start cam
                                kicking()  #start kicking 
                                time.sleep(1)
                                picam2.switch_mode_and_capture_file(capture_config, image_name)     #capture image
                                time.sleep(1)
                                #Send image file to OpenAI DALL-e2 variation
                                response = client.images.create_variation(    #Call OpenAI to create image variation
                                    image=open(image_name, "rb"),   #send captured image
                                    n=1,
                                    size="1024x1024"
                                )
                                image_url = response['data'][0]['url']  #Grab image variation from url
                                image_str = urlopen(image_url).read()   #open
                                image_file = io.BytesIO(image_str)      #transform
                                resized_img = "/home/bmo/Photos/rs_img_" + str(time.time()) + ".png"    #change path if needed
                                with Image.open (image_file) as im:     #Resize image to fit screen
                                    resized = im.resize((512,512))
                                    resized.save(resized_img)
                                image = pygame.image.load(resized_img)  #Prep to show image
                                bmo_side(x,y)
                                pygame.display.update()     #show the small face as background
                                gameDisplay.blit(image, (280,y))    
                                pygame.display.update()     #blit the image on top of the background
                                text_resp = "Here's what I was daydreaming about."
                                result = speech_synthesizer.speak_text_async(text_resp).get()
                                time.sleep(5)
                                text_resp = "Let me know if you need anything else."
                                bmo_talk(x,y)
                                pygame.display.update()
                                picam2.stop_preview()
                                picam2.stop()
                                result = speech_synthesizer.speak_text_async(text_resp).get()
                                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                    bmo_rest(x,y)
                                    pygame.display.update()
                get_keyword()

            # Check for multi-turn chat intent
            elif speech_recognition_result.text == "Let's chat.":   #this sets up multi-turn chat mode
                text_resp = "Ok. What do you want to talk about?"
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done responding
                    bmo_smile(x,y)
                    pygame.display.update()
                    system_message = {"role": "system", "content": "You are a playful but helpful robot named BMO"} #set behavior
                    max_response_tokens = 250   #set max tokens
                    token_limit= 4090   #establish token limit
                    conversation=[]     #init conv
                    conversation.append(system_message)     #set what is in conv

                    def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):  #Token counter function
                        encoding = tiktoken.encoding_for_model(model)
                        num_tokens = 0
                        for message in messages:
                            num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
                            for key, value in message.items():
                                num_tokens += len(encoding.encode(value))
                                if key == "name":  # if there's a name, the role is omitted
                                    num_tokens += -1  # role is always required and always 1 token
                        num_tokens += 2  # every reply is primed with <im_start>assistant
                        return num_tokens

                    while(True):
                        #Start Listening
                        speech_recognition_result = speech_recognizer.recognize_once_async().get()
                        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                            bmo_slant(x,y)
                            pygame.display.update()
                            print("Recognized: {}".format(speech_recognition_result.text))
                            user_input = speech_recognition_result.text     
                            conversation.append({"role": "user", "content": user_input})
                            conv_history_tokens = num_tokens_from_messages(conversation)

                            while (conv_history_tokens+max_response_tokens >= token_limit):
                                del conversation[1] 
                                conv_history_tokens = num_tokens_from_messages(conversation)
        
                            response = client.chat.completions.create(
                                model="gpt-3.5-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
                                messages = conversation,
                                max_tokens=max_response_tokens,
                            )

                            conversation.append({"role": "assistant", "content": response.choices[0].message.content})
                            response_text = response.choices[0].message.content + "\n"
                            print(response_text)
                            bmo_talk(x,y)
                            pygame.display.update()
                            speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
                            result = speech_synthesizer.speak_text_async(response_text).get()
                            if "I'm done" in speech_recognition_result.text:
                                get_keyword()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_smile(x,y)
                                pygame.display.update()
                get_keyword()

            # If no other intent, then use regular completion
            else:
                response = client.Chat.completions.create(
                    model="gpt-4",
                    messages = [
                        {"role": "system", "content": "You are a playful but helpful companion robot named BMO"}, 
                        {"role": "user", "content": (speech_recognition_result.text)},
                        ],
                )
                #Get response
                response_text = response.choices[0].message.content
                print(response_text)
                bmo_talk(x,y)
                pygame.display.update()
                speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
                result = speech_synthesizer.speak_text_async(response_text).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    kicking()
                    bmo_rest(x,y)
                    pygame.display.update()
                    get_keyword()

Credits

David Packman

David Packman

3 projects • 19 followers
I make robot friends

Comments