When we saw what Amazon did with Alexa and what Google did with home, we were sure that the age of computerized personal assistants was here. However, we wondered how to make them more human – Aya is a proof of concept for robotic personal assistants that look and feel human.
Aya is an interactive robot that stores your picture to recognize you and to tell you what object you are holding.
Aya is an embedded systems nightmare! A servo systems interface with audio codecs being run on different threads for multiprocessing.
We 3D-printed the physical parts (courtesy of InMoov®, an open source robot) in a process that spanned over 26 hours. It was built and assembled at Hack the North 2017.
Aya uses AWS Rekognition to compare face geometries to recognize people she has already met, and to meet new humans. We choose the highest confidence object that is not human, and run that through Amazon Polly, which generates an.ogg file. While all this is happening, another thread with custom servo controls is running in parallel to make the jaw move with speech.
Hardware hacks are tough! We ran into a thousand problems, not limited to: melting wires, burning breadboards, overheating servos, and of course part tolerances!
Another difficult aspect of the project was the servo control – this required starting another thread and running servo controls in tandem with speech.
Software-wise, since Amazon Rekognition only outputs an array of labels and their confidence levels, it is often hard to choose the most relevant label for an image. We attempted to solve this by using k-means clustering to generate groups of similar words and then generate sentences for each group. However, since hardware took up so much of our time, we unfortunately did not have the time to fully implement this.
Finally, doing all this under a 36 hour budget (not including sleep!) was the greatest challenge of all, but it forced us to conform to a schedule and make our project agile.
We are proud of being able to make the servo system and software mesh together. Most of all, we are proud to have worked together incredibly well as a team, with little friction and an awesome output.
We want to work on using TensorFlow for clustering the output of the AWS Rekognition, and pyAudioAnalysis to apply waveform analysis technique to the generated audio file to better sync the sound with the servo actuation.