David Packman
Published © CC BY

BMO-AI: The companion robot with artificial intelligence

An artificial intelligence powered portable robot companion based on the character from Cartoon Network's Adventure Time.

AdvancedFull instructions providedOver 1 day4,484

Things used in this project

Hardware components

Raspberry Pi 3 Model B
Raspberry Pi 3 Model B
While these instructions use a Raspberry Pi 3B specifically, it is possible to build this with other single board computers that may require additional steps, hardware, and thermal management considerations.
×1
Raspberry Pi Camera Module 3
Raspberry Pi Camera Module 3
×1
Adafruit Raspberry Pi CRICKIT Hat
This can be swapped out for a servo driver if you're using a different SBC, but code changes will be required and this is very handy for managing both the servomotors and the tactile button inputs.
×1
Adafruit Mono 2.5w Audio Amp
×1
Adafruit mini oval 8ohm 1w speaker
×2
Adafruit mini 2-wire voltmeter
×1
Adafruit Bakelite perfboard plates
You can use regular perforated through-hole protoboards but will need to use a Dremel to cut them into shape instead of scissors with the bakelite.
×1
Adafruit 6mm tactile buttons
×1
Adafruit Mini USB microphone
×1
Jauch Protected 18650 3350mAh LION battery
×2
CWI332-ND SPST slide switch
×1
F17HA-05MC 5V 17mm Fan
×1
FeeTech FS90 Micro Servomotor
×4
D36V50F5 5v 5.5a Voltage Regulator
×1
Wireless USB Keyboard
×1
Raspberry Pi cable kit
You'll at least want a longer DSI ribbon cable than what usually comes with 5 inch DSI displays. CSI and DSI MIPI cables are fairly interchangeable.
×1
DC Power Right Angle Male Barrel Jack
×1
9 inch mini USB with USB right and left angle
×1
18650 Battery Holder with leads
×1
3.5 Mono male audio jack with screw terminator
×1
Waveshare 5 inch DSI LCD
Other brands of DSI LCD monitors can be used, but modifications may need to be made to the mounting brackets to accommodate differences.
×1
Various M2, M2.5 and M3 screws, nuts, heatset inserts, and offsets
×1

Software apps and online services

Raspbian
Raspberry Pi Raspbian
Any Debian distro could also work. However, Ubuntu doesn't work with some of the API libraries in this build.
Microsoft Azure Cognitive Speech Services API
Microsoft Azure Computer Vision
OpenAI ChatGPT 3.5 Turbo
OpenAI DALL-E 2

Hand tools and fabrication machines

3D Printer (generic)
3D Printer (generic)
I recommend using PETG or another type of plastic filament that has a higher resistance to thermal deformation than PLA. Also, Teal is highly recommended.
3D Printing Pen (Generic)
Note: You can also use a soldering iron set to 230c to weld plastic pieces together where a 3D pen is mentioned.
Dupont and JST crimping kit
Hot glue gun (generic)
Hot glue gun (generic)
Soldering iron (generic)
Soldering iron (generic)
Drill, Screwdriver
Drill, Screwdriver
acrylic paint and paintbrush (Generic)

Story

Read more

Schematics

Wake Word Table File

Download and rename to "final_midfa.table", then move this to the same directory as the Python file to use the "Hey BMO" wakeword.

Face Images

These are the facial expressions for BMO, extract to the Pictures directory under you Home directory.

Code

bmo_sample.py

Python
You will need to edit the following lines:
45-59: Update for your face image directory paths
135-162: Update for your servomotor ranges
193-194: Only if you create a different wakeword
219-226: Update with your API Keys and services details
231: Only update if you choose to use a different voice.
291: Only if you want to use a directory other than /home/bmo/Photos/
315: Replace with the email account to send photos from
316: Replace with recipient email address.
321: Replace with sender's smtp server address
327: Replace if your SMTP server uses a different port
291: Only if you want to use a directory other than /home/bmo/Photos/
330: Replace with sender's email password or app password
353: Only if you want to use a directory other than /home/bmo/Photos/
438: Only if you want to use a directory other than /home/bmo/Photos/
486: Only if you want to use a directory other than /home/bmo/Photos/
import os
import sys
import io
import openai
import azure.cognitiveservices.speech as speechsdk
import pygame
import time
import smtplib
import imghdr
import tiktoken
import warnings
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation
from email.message import EmailMessage
from libcamera import Transform
from picamera2 import Picamera2
from adafruit_crickit import crickit
from urllib.request import urlopen
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
from stability_sdk import client
from array import array
from PIL import Image

#PYGAME STUFF
global display_width
display_width = 800
global display_height
display_height= 480
global gameDisplay
gameDisplay = pygame.display.set_mode((display_width,display_height))
global black
black = (0,0,0)
global teal
teal = (128,230,209)
global clock
clock = pygame.time.Clock()
global crashed
crashed = False

#Assign face images, may vary depending on your choices and the paths to your images.
#Update these so that the paths are pointing to the directory where you put your face images.
global BMO1
BMO1 = pygame.image.load('/home/bmo/Pictures/bmo1.jpg')
global BMO2
BMO2 = pygame.image.load('/home/bmo/Pictures/bmo11.jpg')
global BMO3
BMO3 = pygame.image.load('/home/bmo/Pictures/bmo7.jpg')
global BMO4
BMO4 = pygame.image.load('/home/bmo/Pictures/bmo2.jpg')
global BMO5
BMO5 = pygame.image.load('/home/bmo/Pictures/bmo3.jpg')
global BMO6
BMO6 = pygame.image.load('/home/bmo/Pictures/bmo8.jpg')
global BMO7
BMO7 = pygame.image.load('/home/bmo/Pictures/bmo16.jpg')
global BMO8
BMO8 = pygame.image.load('/home/bmo/Pictures/bmo15.jpg').convert() #the convert at the end of this one lets you display text if you choose to.

#Define backgrounds for display blit
def bmo_rest(x,y):
    gameDisplay.blit(BMO1, (x,y))

def bmo_sip(x,y):
    gameDisplay.blit(BMO2, (x,y))

def bmo_slant(x,y):
    gameDisplay.blit(BMO3, (x,y))

def bmo_talk(x,y):
    gameDisplay.blit(BMO4, (x,y))

def bmo_wtalk(x,y):
    gameDisplay.blit(BMO5, (x,y))

def bmo_smile(x,y):
    gameDisplay.blit(BMO6, (x,y))

def bmo_squint(x,y):
    gameDisplay.blit(BMO7, (x,y))

def bmo_side(x,y):
    gameDisplay.blit(BMO8, (x,y))

global x
x = (display_width * 0)
global y
y = (display_height * 0)

#BUTTON STUFF
global ss
ss = crickit.seesaw

#Assign signal pins to buttons, these may vary depending on how they are connected in your build.
#If you don't want to use them, leave these commented out.
#If you want to assign buttons to actions, you'll need to do that in the code.

#button_up = crickit.SIGNAL1
#button_rt = crickit.SIGNAL2
#button_dn = crickit.SIGNAL3
#button_lt = crickit.SIGNAL4
#button_bu = crickit.SIGNAL5
#button_bk = crickit.SIGNAL6
#button_gn = crickit.SIGNAL7
#button_rd = crickit.SIGNAL8

#Set signal pins for button pullup input. 
#Note, you should test your inputs to make sure they are assigned correctly.
#ss.pin_mode(button_up, ss.INPUT_PULLUP)
#ss.pin_mode(button_rt, ss.INPUT_PULLUP)
#ss.pin_mode(button_dn, ss.INPUT_PULLUP)
#ss.pin_mode(button_lt, ss.INPUT_PULLUP)
#ss.pin_mode(button_bu, ss.INPUT_PULLUP)
#ss.pin_mode(button_bk, ss.INPUT_PULLUP)
#ss.pin_mode(button_gn, ss.INPUT_PULLUP)
#ss.pin_mode(button_rd, ss.INPUT_PULLUP)

#SERVO STUFF
#servo pin assignments, these may vary if you connected servos in a different order, make sure to test.
global LA
LA = crickit.servo_1
global RA
RA = crickit.servo_2
global LL
LL = crickit.servo_3
global RL
RL = crickit.servo_4

#Note, the ranges will vary depending on your servos and how you preset their positions before installing.
#Make sure to position the servos before installing so the appendages have the right range of motion.
#You'll likely need to adjust these for your servos.

#Right Arm ranges
global RAU      #right arm up
RAU = 180       
global RAUP     #right arm up partially
RAUP = 130
global RADP     #right arm down partially
RADP = 60
global RAD      #right arm down
RAD = 10

#Left Arm ranges
global LAU
LAU = 10
global LAUP
LAUP = 50
global LADP
LADP = 120
global LAD
LAD = 160

#Right Leg ranges
global RLU
RLU = 130
global RLD
RLD = 65

#Left Leg ranges
global LLU
LLU = 55
global LLD
LLD = 120

#defining "smove", a smooth servo movement routine
def smove(sname,start,end,delta):	#move from start to end incrementally using delta in seconds
    incMove=(end-start)/10.0
    incTime=delta/10.0
    for x in range(10):
        sname.angle = (int(start+x*incMove))
        time.sleep(incTime)
# smove sample, to move Right Arm from Down to Up: smove(RA,RAD,RAU,0.20)

#Initialize the camera now since you can only do it one time, might as well do it now.
picam2 = Picamera2()

#Define the keyphrase detection function
def get_keyword():
    #First, do some stuff to signal ready for keyword
    gameDisplay.fill(teal)
    bmo_sip(x,y)
    pygame.display.update()
    smove(RA,RAD,RAUP,0.25)
    time.sleep(.8)
    bmo_rest(x,y)
    pygame.display.update()
    smove(RA,RAUP,RAD,0.25)

    #Create instance of kw recog model.
    #Update this to point at location of your local kw table file if not in same dir
    #Also, update the file name if you created a different keyphrase and model in Azure Sound Studio
    model = speechsdk.KeywordRecognitionModel("final_midfa.table")
    keyword = "Hey be Moe"  #If you created a different keyphrase to activate the listening function, you'll want to change this.
    keyword_recognizer = speechsdk.KeywordRecognizer()
    done = False

    def recognized_cb(evt):
        #This function checks for recognized keyword and kicks off the response to the keyword.
        result = evt.result
        if result.reason == speechsdk.ResultReason.RecognizedKeyword:
            bmo_smile(x,y)
            pygame.display.update()
            print("Recognized Keyphrase: {}".format(result.text))
        nonlocal done
        done = True

    #Once a keyphase is detected...
    keyword_recognizer.recognized.connect(recognized_cb)
    result_future = keyword_recognizer.recognize_once_async(model)
    print('Ready'.format(keyword))
    result = result_future.get()

    if result.reason == speechsdk.ResultReason.RecognizedKeyword:
        Respond_To_KW() #move on the the respond function

#Define actions after KW recognized
def Respond_To_KW():
    #Set all your keys and cognitive services settings here
    openai.api_key = "xxxxxxxxxxxxxxxxxxx"     #Copy your OpenAI key here
    speech_key = "xxxxxxxxxxxxxxxxxx"     #Copy your Azure Speech key here
    speech_region = "xxxxxxx"           #Copy you Azure Speech Region here
    vision_key = "xxxxxxxxxxxxxxxxxx"   #Copy your Azure Vision key here
    stab_key = "xxxxxxxxxxxxxxxxx"     #copy your Stability AI Dreamstudio.ai key here
    stab_engine = "stable-diffusion-xl-beta-v2-2-2"     #Copy your prefered stable-diffusion engine name here
    vision_endpoint = "https://xxxxxx.cognitiveservices.azure.com/"     #Copy your Azure Vision endpoint url here

    #setup for the speech and vision services
    computervision_client = ComputerVisionClient(vision_endpoint, CognitiveServicesCredentials(vision_key))
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
    speech_config.speech_synthesis_voice_name = "en-US-JaneNeural"      #If you decide to use a different voice, update this.
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

    #Defining a few movement patterns and setting threads
    def waving():
        smove(LA,LAD,LAU,0.20)
        time.sleep(.4)
        smove(LA,LAU,LAUP,0.20)
        time.sleep(.3)
        smove(LA,LAUP,LAU,0.20)
        time.sleep(.4)
        smove(LA,LAU,LAD,0.20)

    def kicking():
        smove(LL,LLD,LLU,0.17)
        time.sleep(.3)
        smove(LL,LLU,LLD,0.17)
        smove(RL,RLD,RLU,0.17)
        time.sleep(.3)
        smove(RL,RLU,RLD,0.17)
        smove(LL,LLD,LLU,0.17)
        time.sleep(.3)
        smove(LL,LLU,LLD,0.17)
        time.sleep(.2)

    #Response after keyword
    text_resp = "What's up?"    #This is text for TTS
    bmo_talk(x,y)               #Face to display and where to start
    pygame.display.update()     #Update screen to show change
    result = speech_synthesizer.speak_text_async(text_resp).get() #Run the TTS on text_resp
    waving()                    #wave motion

    #After done talking
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:      #this makes it wait for TTS to finish
        bmo_smile(x,y)
        pygame.display.update()
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)      #config STT
        speech_recognition_result = speech_recognizer.recognize_once_async().get()      #listen for voice input
        
        #Convert to text once speech is recognized
        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:     #wait for voice input to complete
            print("Recognized: {}".format(speech_recognition_result.text))      #Print was was heard in the console (for testing)
            bmo_slant(x,y)
            pygame.display.update()

            # if and elif checks for special commands
            #First, check for taking photo intent
            if "photo" in speech_recognition_result.text:         #If the word "picture" is in what the user said, we'll take a picture.
                text_resp = "Ok, hold on a second and I'll take a picture."     #Update the text_resp variable to say this next.
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()       #say it
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:      #wait until done saying it
                    bmo_squint(x,y)
                    pygame.display.update()
                    smove(LA,LAD,LAUP,0.20)
                    #Configure the camera here
                    capture_config = picam2.create_still_configuration(main={"size": (1920, 1080)}) #config for still image cap at 1920x1080
                    capture_config["transform"] = Transform(hflip=1,vflip=1)    #since camera is upside down, lets rotate 180 deg
                    image_name = "/home/bmo/Photos/img_" + str(time.time()) + ".jpg"   #Change the path if needed
                    picam2.start()      #now we can start the camera
                    time.sleep(2)       #wait two seconds for the camera to acclimate or you won't get a good photo
                    picam2.switch_mode_and_capture_file(capture_config, image_name)     #take the photo
                    time.sleep(1)       #wait a sec
                    smove(LA,LAUP,LAD,0.15)
                    BMOPIC = pygame.image.load(image_name)      #Configure pygame to show the photo
                    bmo_side(x,y)
                    pygame.display.update()     #shift to the small face
                    gameDisplay.blit(BMOPIC, (280, y))      #Get ready to blit the photo with a 280 offset on x axis
                    pygame.display.update()     #update screen to show foto overlay on the small face background
                    text_resp = "Would you like me to share this picture with you?"
                    picam2.stop_preview()       #probably a good idea to turn the camera off
                    picam2.stop()               #yep, probably a good idea
                    result = speech_synthesizer.speak_text_async(text_resp).get()
                    time.sleep(.1)
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                        if speech_recognition_result.text == 'Yes.':    #if user says anything that includes the word yes, send the photo
                         #Email attachment routine
                         bmo_smile(x,y)
                         pygame.display.update()
                         message = EmailMessage()
                         email_subject = "Image from BMO"
                         sender_email_address = "send@email.com"      #Insert your sending email address here
                         receiver_email_address = "recieve@email.com"    #Insert the receiving email address here
                         # configure email headers
                         message['Subject'] = email_subject
                         message['From'] = sender_email_address
                         message['To'] = receiver_email_address
                         email_smtp = "smtp.email.com"                      #Insert the sending email smtp server address here
                         
                         with open(image_name, 'rb') as file:
                              image_data = file.read()
                         message.set_content("Email from BMO with image attachment")
                         message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
                         server = smtplib.SMTP(email_smtp, '587')           #check your smtp server docs to make sure this is the right port
                         server.ehlo()
                         server.starttls()
                         server.login(sender_email_address, 'EmailPasswordHere')  #insert the sender's email password or app password here
                         server.send_message(message)
                         server.quit()
                         text_resp = "Ok, I sent it to you."
                         bmo_talk(x,y)
                         pygame.display.update()
                         result = speech_synthesizer.speak_text_async(text_resp).get()
                         if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                             bmo_rest(x,y)
                             pygame.display.update()
                get_keyword()

            # Check for Azure Computer Vision Description intent
            elif "you looking at" in speech_recognition_result.text: #this starts of the image description function
                text_resp = "Oh, let me think about how to describe it."
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get() #talk
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done talking
                    bmo_squint(x,y)
                    pygame.display.update()
                    acapture_config = picam2.create_still_configuration(main={"size": (1920, 1080)})    #set cam resolution
                    acapture_config["transform"] = Transform(hflip=1,vflip=1)   #flip camera since it is upside down
                    aimage_name = "/home/bmo/Photos/aimg_" +str(time.time()) +".jpg"   #Change path if needed
                    picam2.start()  #start camera
                    time.sleep(2)   #wait to let the camera adjust
                    picam2.switch_mode_and_capture_file(acapture_config, aimage_name)   #switch mode to image capture and snap
                    text_resp = "Hmm."
                    result=speech_synthesizer.speak_text_async(text_resp).get() #say Hmm to break up the wait
                    # Open local image file
                    local_image = open(aimage_name, "rb")   #open the saved image
                    # Call API
                    description_result = computervision_client.describe_image_in_stream(local_image)    #prep to stream image
                    # Get the captions (descriptions) from the response, with confidence level
                    print("Description of local image: ")
                    picam2.stop_preview()   #shut off camera
                    picam2.stop()
                    if (len(description_result.captions) == 0): #If no image description, say this stuff and shrug
                        text_resp = "Sorry, I really don't know what that is."
                        bmo_talk(x,y)
                        pygame.display.update()
                        result=speech_synthesizer.speak_text_async(text_resp).get()
                        smove(LA,LAD,LADP,0.12)
                        smove(RA,RAD,RADP,0.12)
                        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_rest(x,y)
                                pygame.display.update()
                                smove(LA,LADP,LAD,0.12)
                                smove(RA,RADP,RAD,0.12)

                    else:
                        for caption in description_result.captions: #If image description created
                            text_resp = "That looks like " + (caption.text) #set response
                            bmo_talk(x,y)
                            pygame.display.update()
                            smove(LA,LAD,LAUP,0.18)
                            result=speech_synthesizer.speak_text_async(text_resp).get() #give response
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done talking, do this
                                bmo_rest(x,y)
                                smove(LA,LAUP,LAD,0.20)
                                time.sleep(.3)
                                pygame.display.update()
                get_keyword()

            # Check for Stable Diffusion painting intent
            elif "paint" in speech_recognition_result.text:
                text_resp = "Sure, what would you like me to paint?"
                bmo_talk(x,y)
                pygame.display.update()
                smove(LA,LAD,LAUP,0.18)
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    print("recognized: {}".format(speech_recognition_result.text))
                    bmo_smile(x,y)
                    pygame.display.update()
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                        text_resp = "Ok, give me a few seconds to paint this."
                        bmo_talk(x,y)
                        pygame.display.update()
                        smove(LA,LAUP,LAU,0.15)
                        smove(LA,LAU,LAUP,0.15)
                        result = speech_synthesizer.speak_text_async(text_resp).get()
                        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                            bmo_smile(x,y)
                            pygame.display.update()
                            stability_api = client.StabilityInference(
                                key=stab_key,
                                verbose=True, # Print debug messages.
                                engine="stable-diffusion-xl-beta-v2-2-2",
                            )
                            answers = stability_api.generate(
                                prompt=speech_recognition_result.text,
                                #seed=992446758, # If a seed is provided, the resulting generated image will be deterministic.
                                steps=30, # Amount of inference steps performed on image generation. Defaults to 30.
                                cfg_scale=8.0, # Influences how strongly your generation is guided to match your prompt. Defaults to 7.0 if not specified.
                                width=512, # Generation width, defaults to 512 if not included.
                                height=512, # Generation height, defaults to 512 if not included.
                                samples=1, # Number of images to generate, defaults to 1 if not included.
                                sampler=generation.SAMPLER_K_DPMPP_2M # Choose which sampler we want to denoise our generation with.
                            )
                            for resp in answers:
                                for artifact in resp.artifacts:
                                    if artifact.finish_reason == generation.FILTER:
                                        warnings.warn(
                                            "Your request activated the API's safety filters and could not be processed."
                                            "Please modify the prompt and try again.")
                                    if artifact.type == generation.ARTIFACT_IMAGE:
                                        paint_name = "/home/bmo/Photos/paint_" +str(time.time()) +".png"
                                        img = Image.open(io.BytesIO(artifact.binary))
                                        img.save(paint_name) # Save our generated images with their seed number as the filename.
                                        pgpaint = pygame.image.load(paint_name)
                                        bmo_side(x,y)
                                        pygame.display.update()
                                        gameDisplay.blit(pgpaint, (280,y))
                                        pygame.display.update()
                                        text_resp = "I hope you like it."
                                        result = speech_synthesizer.speak_text_async(text_resp).get()
                                        time.sleep(5)
                                        smove(LA,LAUP,LAD,.15)
                get_keyword()

            # Check for DALL-E Image Variation intent
            elif "you thinking" in speech_recognition_result.text:  #setup for the daydream mode
                text_resp = "Oh, I'm just daydreaming. Do you want me to show you what I was imagining?"
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done talking
                    bmo_smile(x,y)
                    pygame.display.update()
                    speech_recognition_result = speech_recognizer.recognize_once_async().get()  #listen
                    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech: #when done listening
                        if "no" in speech_recognition_result.text:  #if response negative, then
                            text_resp = "Ok, let me know if you need anything else."
                            bmo_talk(x,y)
                            pygame.display.update()
                            smove(RA,RAD,RADP,0.12)
                            smove(LA,LAD,LADP,0.12)
                            result = speech_synthesizer.speak_text_async(text_resp).get()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_rest(x,y)
                                pygame.display.update()
                                smove(RA,RADP,RAD,0.12)
                                smove(LA,LADP,LAD,0.12)
                            get_keyword()
                        else:   #if result not negative, then
                            text_resp = "Ok, give me a few seconds to draw you a picture."  
                            bmo_talk(x,y)
                            pygame.display.update()
                            result = speech_synthesizer.speak_text_async(text_resp).get()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_slant(x,y)
                                pygame.display.update()
                                capture_config = picam2.create_still_configuration(main={"size": (512,512)})    #set up supported image resolution
                                capture_config["transform"] = Transform(hflip=1,vflip=1)    #flip image
                                image_name = "/home/bmo/Photos/img_" + str(time.time()) + ".png"    #change path if needed
                                picam2.start()  #start cam
                                kicking()  #start kicking 
                                time.sleep(1)
                                picam2.switch_mode_and_capture_file(capture_config, image_name)     #capture image
                                time.sleep(1)
                                #Send image file to OpenAI DALL-e2 variation
                                response = openai.Image.create_variation(
                                    image=open(image_name, "rb"),
                                    n=1,
                                    size="512x512"
                                )
                                image_url = response['data'][0]['url']  #Grab image from url
                                image_str = urlopen(image_url).read()   #open
                                image_file = io.BytesIO(image_str)      #transform
                                image = pygame.image.load(image_file)   #Prep to show on screen
                                bmo_side(x,y)
                                pygame.display.update()     #show the small face as background
                                gameDisplay.blit(image, (280,y))    
                                pygame.display.update()     #blit the image on top of the background
                                text_resp = "Here's what I was daydreaming about."
                                result = speech_synthesizer.speak_text_async(text_resp).get()
                                time.sleep(5)
                                text_resp = "Let me know if you need anything else."
                                bmo_talk(x,y)
                                pygame.display.update()
                                picam2.stop_preview()
                                picam2.stop()
                                result = speech_synthesizer.speak_text_async(text_resp).get()
                                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                    bmo_rest(x,y)
                                    pygame.display.update()
                get_keyword()

            # check for DALL-E image from ChatGPT description intent
            elif "describe" in speech_recognition_result.text:
                completion_request = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    messages = [
                        {"role": "system", "content": "You are a snarky but helpful companion robot named BMO"}, 
                        {"role": "user", "content": (speech_recognition_result.text)},
                        ],
                    max_tokens=150,
                    temperature=0.7,
                )

                #Get response
                #response_text = completion_request["choices"][0]["content"]
                response_text = completion_request.choices[0].message.content
                print(response_text)
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(response_text).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    text_resp2 = "Let me show you what I think this looks like."
                    result = speech_synthesizer.speak_text_async(text_resp2).get()
                    waving()
                    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                        bmo_slant(x,y)
                        pygame.display.update()
                    img_response = openai.Image.create(
                        prompt=response_text,
                        n=1,
                        size="512x512"
                        )
                    image_url = img_response['data'][0]['url']
                    image_str = urlopen(image_url).read()
                    image_file = io.BytesIO(image_str)
                    image = pygame.image.load(image_file)
                    bmo_side(x,y)
                    pygame.display.update()
                    gameDisplay.blit(image, (280,y))
                    pygame.display.update()
                    text_resp3 = "Hope this looks right."
                    result = speech_synthesizer.speak_text_async(text_resp3).get()
                    time.sleep(5)
                get_keyword()

            # Check for multi-turn chat intent
            elif speech_recognition_result.text == "Let's chat.":   #this sets up multi-turn chat mode
                text_resp = "Ok. What do you want to talk about?"
                bmo_talk(x,y)
                pygame.display.update()
                result = speech_synthesizer.speak_text_async(text_resp).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:  #once done responding
                    bmo_smile(x,y)
                    pygame.display.update()
                    system_message = {"role": "system", "content": "You are a playful but helpful robot named BMO"} #set behavior
                    max_response_tokens = 250   #set max tokens
                    token_limit= 4096   #establish token limit
                    conversation=[]     #init conv
                    conversation.append(system_message)     #set what is in conv

                    def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):  #Token counter function
                        encoding = tiktoken.encoding_for_model(model)
                        num_tokens = 0
                        for message in messages:
                            num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
                            for key, value in message.items():
                                num_tokens += len(encoding.encode(value))
                                if key == "name":  # if there's a name, the role is omitted
                                    num_tokens += -1  # role is always required and always 1 token
                        num_tokens += 2  # every reply is primed with <im_start>assistant
                        return num_tokens

                    while(True):
                        #Start Listening
                        speech_recognition_result = speech_recognizer.recognize_once_async().get()
                        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                            bmo_slant(x,y)
                            pygame.display.update()
                            print("Recognized: {}".format(speech_recognition_result.text))
                            user_input = speech_recognition_result.text     
                            conversation.append({"role": "user", "content": user_input})
                            conv_history_tokens = num_tokens_from_messages(conversation)

                            while (conv_history_tokens+max_response_tokens >= token_limit):
                                del conversation[1] 
                                conv_history_tokens = num_tokens_from_messages(conversation)
        
                            response = openai.ChatCompletion.create(
                                model="gpt-3.5-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
                                messages = conversation,
                                temperature=.6, #Temp dictates probability threshold. Lower is more strict, higher gives more random responses
                                max_tokens=max_response_tokens,
                            )

                            conversation.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
                            print("\n" + response['choices'][0]['message']['content'] + "\n")

                            response_text = response['choices'][0]['message']['content'] + "\n"
                            print(response_text)
                            bmo_talk(x,y)
                            pygame.display.update()
                            speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
                            result = speech_synthesizer.speak_text_async(response_text).get()
                            if "I'm done" in speech_recognition_result.text:
                                get_keyword()
                            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                                bmo_smile(x,y)
                                pygame.display.update()
                get_keyword()

            # If no other intent, then use regular completion
            else:
                completion_request = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    messages = [
                        {"role": "system", "content": "You are a playful but helpful companion robot named BMO"}, 
                        {"role": "user", "content": (speech_recognition_result.text)},
                        ],
                    max_tokens=250,     #Sets size of response. GPT3.5 limit is about 4000 tokens.
                    temperature=0.6,    #Lower response temp is more factual, higher temp is more random
                )

                #Get response
                #response_text = completion_request["choices"][0]["content"]
                response_text = completion_request.choices[0].message.content
                print(response_text)
                bmo_talk(x,y)
                pygame.display.update()
                speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
                result = speech_synthesizer.speak_text_async(response_text).get()
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    kicking()
                    bmo_rest(x,y)
                    pygame.display.update()
                    get_keyword()
get_keyword()

Credits

David Packman

David Packman

2 projects • 6 followers
I make robot friends

Comments