Overview
We aim to improve the way we interact with lamps by creating an interactive reading and tracking lamp. This interactive tracking idea is inspired by Pixar's animated lamp, integrating features including image-to-text conversion, text-to-speech, object detection, mood detection and stepper motor control using Arduino.
Our current setup comprises an integrated system that incorporates computer vision algorithm, arduino programming, and mechanical design. The heart of the system is a web camera that captures images for both object tracking and document reading. The computer vision algorithm, powered by YOLOv4-tiny, processes these images to detect objects and text. Upon detection, the system calculates the object's position and orientation, translating this data into commands for the stepper motor. This motor adjusts the lamp's position to follow the object or focus on a document for reading.
The text extracted from documents is converted to speech, providing an audible reading to the user. This feature is particularly useful for multitasking, allowing users to listen to documents while engaged in other activities. The system is controlled via a microcontroller, which manages the inputs from the camera and outputs to the motor and speaker.
A cooler upgrade we added is the ability to detect and recognize the mood of individuals. This feature allows the lamp to interact in a personalized manner, such as changing the LED color on the user's detected mood or signaling with a unique sound pattern when a recognized mood of a person is detected.
Related work
- The design of face recognition and tracking for human-robot interaction
- Detection and Recognition of Face Using Deep Learning
- Real-time emotion detection using pythonThe design of face recognition and tracking for human-robot interaction
- https://makezine.com/projects/laser-cut-pixar-luxo-lamp/
Our project was inspired by studies related to affective robots, face recognition, and motion tracking techniques. The design concept is founded on the Disney character, the Pixar Lamp, as we explore the interaction between humans and robots in various contexts and types of human emotion.
Milestone 1 &2 Summary
Our initial idea featured a movable robot that read text from documents as it went from left to right. However, after different conversation we decided to improve this as reading characters line by line was slower and also needed more processing resources compared to capturing the full document and processing everything at once. We also wanted to include an IR remote control to allow users have more playback control which we didn't consider in the next stage of our design. By the second milestone of our project, we had successfully integrated image processing capabilities building on our initial idea of employing a remote control. This advancement was pivotal, allowing for the immediate conversion of visual data into a readable format. Complementing this, we incorporated a Text-to-Speech (TTS) system, which audibly articulated the extracted text, adding a significant layer of interactivity to our project. Furthermore, the implementation of a single stepper motor, initially constrained to horizontal movements, was a critical step towards achieving responsive tracking, essential for aligning the camera's focus as per the user's direction through the remote control. We used a power generator as shown in the image below, ensuring precise control over the electronic components' voltage and current. Although bulkier than desired, this choice was essential for maintaining system stability and reliability. The incorporation of the stepper motor was a foray into mechanical control, vital for the project's tracking aspect. At this stage, our focus was on mastering horizontal movement control, setting a foundation for more complex directional control in the future.
Milestone 3
Hardware
Initially, our lamp's form comprised of a cylindrical 3D structure as shown below, that connected the base of the lamp to the stepper motor. This connection however, helped us achieve only movements in the horizontal directions(left to right & right to left) which we wanted to improve upon in the subsequent milestones.
After multiple iterations and different designs including the design shown below, where we had a screw holding the base of the lamp to a tilt platform, we realized the need to account for the weight of the lamp's head and the connected camera to achieve accurate movements.
Finally we decided on a setup that encompasses a laser cut enclosure that sits the lamp's structure. This lamp structure rests on a 3D printed platform that rotates with the help of a stepper motor and helps the lamp move in a horizontal fashion from left to right. On top of this platform is another 3D printed hinge that houses another stepper motor and helps control its tilt movements. The lamp and a connected web camera is setup on this platform held up by the hinge to allow tilt motions. These components are connected to the arduino and motor drivers placed in a cut box to abstract floating wires and other loose components, however it provides an opening for the rgb led.
Design Workflow
The diagram above captures how data flows for our project. Given our input being the web cam that collects a stream of images, we process the input image streams using python script that helps us figure out what a document contains or how far the detected person is from the cameras center, both in the vertical and horizontal directions. This scripts sends this computed information to the arduino via serial communication. The arduino processes this information in the case of the object tracker and mood detector, by extracting the part of the transmission that correlates to the mood, the angles(vertical deviation and horizontal deviation) as well as the directions. We use the angles, to calculate how many steps the stepper should move. Given some predefined colors, we set the red, green and blue for the rgb to indicate what type of mood was detected.
Software
The processing part of our code that cleans the image stream, measures the deviation of a person from the web cam's mid point and reads the text in the document is written in Python. We used the OCR from Open CV to extract the characters from the image taken with the web camera. After we extract these characters, we use the google Text to speech Library(gTTS) to create an audio version of the extracted text. We use the Pygame library to immediately read out this speech instead of our initial milestone where we opened an audio player on your computer to start this speech. These libraries provide us the flexibility to implement the document reading portion of our task. The mood detection and tracking functionalities are achieved through of multiple libraries and models like deepface, yolov4 and OpenCV. The deepface library provides functions to help us analyze facial expressions captured by the webcam to identify the dominant emotion. DeepFace uses deep learning models to provide accurate emotion analysis, which can be utilized in different interactive applications like ours. Facial expressions such as happiness, sadness, anger, or neutrality is what we detect. This mood information is then sent to the Arduino as a single character code ('h' for happy, 's' for sad, and so on) through serial communication.
For object detection and tracking, we use YOLOv4 in together with OpenCV. YOLOv4, provided a faster, real-time object detection system, and identifies and locates persons in the video stream. Once a person is detected, we draw a bounding box around the detected person and OpenCV helps in tracking the movement and calculating the deviation of the person from the center of the webcam's field of view. movement. The script then calculates the horizontal and vertical deviations of the person from a reference point, typically the center of the camera's field of view. These deviations are then converted into angles. For the horizontal angle, the script determines whether the person is to the left or right of the center and calculates the angle of deviation accordingly. Similarly, for the vertical angle, it assesses whether the person is above or below the central horizontal axis.
To bring the physical interaction into our project, we integrate Arduino using serial communication. Based on the mood and position data processed by Python, we can control the hardware components like the stepper motors and RGB LEDs. This integration bridges the gap between digital and physical realms, allowing for a wide range of creative and practical applications.
Once the Arduino receives this mood data, it triggers a corresponding response through the RGB LED. Each mood is associated with a specific color, allowing the system to visually communicate the detected emotional state. For instance, happiness is represented by a bright yellow, while sadness is represented with a red. This visual representation of mood not only adds an element of interactivity but also aids in making the technology more intuitive and user-friendly. The mood detection system is not just limited to visual feedback but is extended to auditory signals as well. The Python script, using libraries like Pygame, plays specific audio files or sounds corresponding to the detected mood. For example, a cheerful sound or melody might be played when happiness is detected, or a softer, more subdued tone for sadness.
This audio feedback, in conjunction with the Arduino-controlled visual cues, creates a rich, multi-sensory user experience. The combination of sound and light based on emotional analysis allows for a more engaging and empathetic interaction with the user.
Once these angles are calculated, they are sent to the Arduino via serial communication in a structured format, along with the direction of movement required (left/right for horizontal, up/down for vertical). The Arduino, upon receiving this data, translates these angles into a specific number of steps for the stepper motors to execute. The motors are connected to mechanisms that could adjust the orientation of the camera or another device, aligning it with the subject's position. The Arduino controls each motor's direction of rotation and the number of steps to take, enabling precise movements. The horizontal angle information controls one motor, while the vertical angle information guides the other. This dual-axis control allows for a two-dimensional tracking system, capable of following the subject's movements across the camera's plane.
Comments