As per World Economic Forum, almost a third of Japan's population is over 65 – an estimated 36.23 million. Because of growing elderly population in Japan, a lot of elderly people live alone and need help. Even there is a term for Lonely deaths in Japan - Kodokushi - that means elderly people die alone and do not get discovered for a long period of time.
We need to address this by providing companion. Human companions are unreliable and sometimes costly. So, we need cost effective technology solution to address the loneliness in elders.
To solve this problem, we are planning to build a companion robot that can help elderly by understanding their emotion, giving them feedback based on emotions, helping them call someone in emergency, and monitoring basic activities to check if the user is ok.
Hardware ArchitectureThe Robot's brain is a Raspberry pi 4 device. It has a camera to take picture and record videos. It also has microphone to record audio and speaker to play the audio files. Motor driver shield helps the robot to drive motors (DC, and servo) and also helps power the Raspberry pi. The camera is mounted on a servo motors to allow it to record atleast upto 120 degrees. The whole thing is powered by 2 rechargeable Li-ion batteries.
AI ModelsEmotion recognition model
To recognize emotions of a person in realtime we have used DeepFace library and its builtin AI models. The models available in the DeepFace library provide extra demographic information that can be used to assess person's age and help them accordingly.
Action trigger detection and Emergency detection model
To identify the type of help the user needs, we have built a custom model that can recognize emergencies such as ; Fall, medical help needed, need to call someone, and want to ask a query.
Combining the results from above two models, the robot decides to some activity to either engage the user or do the required action.
Firmware FlowchartIn the code, first we initialise the robot by turning on the motor driver, camera, and checking the state of the rest of the robot. After that, the robot connects to a WiFi network, that will be used for calling various APIs. Once it is connected to the network, the AI models start doing their job with the camera captured frames. There are separate models for detecting a person, different gestures, and emotions. First the person in the view is recognised, then AI model looks for either emotions or Gestures. If no gestures, then robot acts based on emotions. As we can see in the flow chart, there are various actions that robot does for different emotions. However, if a gesture is identified, then the robot will call the respective services accordingly.
Implementation steps1. testing individual components:
We wrote python scripts for testing individual components of the robot such as; speaker, microphone, camera, motor driver, etc.
2. Collecting data for AI model
To recognise the gestures by a person, we needed to build a custom model, and that requires a custom dataset. We have used python and also google teachable machines to collect the dataset of different gestures.
Also, we collected required dataset for person detection model.
3. Training person identification model
Using person dataset we build a classifier that can identfy a person in my house. We have used transfer learning on pre-trained CNN models to train a custom model.
4. Training gesture recognition model
Using recognition dataset I built a model that can identfy three different gestures such as medical emergency, want to call someone, want to ask something.
5.Testing Facial expression recognition model and implementing it on Robot
I installed Deepface library and tested facial expression recognition model for different types of emotions.
6.Testing Gemini APIand implementation
To answer user queries I have used Gemini API with a text model.
7.Integrating the models with microphone, camera, and speaker
To get the user inputs and finally act on the inputs I have integrated it with microphone, camera, speaker. Microphone takes users voice input and sends it to a transcription model to get it convert it into text. Camera is used for taking pictures and these pictures are used for identifying emotion, people, and action.
8.Defining various movements to be done with emotion recognition
My robot has two motors that can be turned in different ways to imitate different activities. These activities are integrated with different emotions.
TestingAfter implementing the whole robot I have tested it in my living room.
ConclusionDemo video






Comments