The Robot That Learned to Talk by Staring in the Mirror
This robot watched itself in a mirror to learn how to speak without creeping people out.
Robots are supposed to be taking over all of the mundane household chores that none of us want to do, like cleaning, folding laundry, and cooking. I don’t know about you, but I haven’t seen Rosey the Robot around my home lately. Despite the promises that have been made for years, robots are still far behind where we would like them to be.
There are many reasons for this, but think about it for a minute — even if robots could do everything for us, would we really want them in our homes? Their human-like, yet not exactly human, facial expressions are enough to send anyone on an unpleasant trip through the uncanny valley. I’d rather do the laundry myself than live in a house of horrors, thank you very much.
Fortunately, engineers at Columbia University are working to solve this problem. Recognizing that lip motion gets an outsized amount of our attention when interacting with other people, the team developed a system that teaches robots to move their lips just like humans when they speak.
According to the researchers, nearly half of our visual attention during face-to-face conversation is focused on the mouth. Yet most humanoid robots barely move their lips at all, or rely on stiff, preprogrammed animations that feel unnatural. The result is a robot that can walk, wave, or even talk, but still looks oddly lifeless — or worse, unsettling.
The team tackled this issue by giving a robot face much more expressive hardware and letting it learn on its own, rather than hard-coding rules. The robot’s face features soft, flexible lips driven by 26 tiny motors, allowing for a far richer range of motion than is typical in humanoid robots. But hardware alone wasn’t enough. The real breakthrough came from how the robot learned to use its face.
First, the robot watched itself in a mirror, making thousands of random facial expressions. Over time, it learned how activating different motors changed its appearance, building an internal “vision-to-action” model of its own face. Once it understood itself, the robot moved on to observing humans, watching hours of videos of people speaking and singing online. From these examples, it learned how lip movements correspond to different sounds — without being told what any of the words meant.
Using a self-supervised AI system based on variational autoencoders and transformer models, the robot can now translate audio directly into coordinated lip movements. In tests, it was able to articulate speech and even sing, syncing its lips more naturally than previous rule-based approaches. Impressively, the system generalized across multiple languages, successfully producing lip movements for 10 languages it had never encountered during training.
Ultimately, the team believes facial expression is the missing link in human-robot interaction. As humanoid robots move into entertainment, education, healthcare, and elder care, lifelike faces may matter just as much as capable hands or legs. If robots are ever going to feel truly welcome in our homes, crossing the uncanny valley may start with getting the lips right.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.