Teaching Robots Through VR
Open-TeleVision is a novel teleoperation system that uses an Apple Vision Pro headset to teach robots to do chores like folding laundry.
Before a robot can learn to do something useful — like cooking or folding the laundry — it first needs someone to show it the ropes. Imitation learning, in which robots learn from human demonstrations of a task, has emerged in recent years as an excellent tool for teaching new skills. One of the best ways to capture data from those demonstrations is teleoperation. Using teleoperation, a person’s movements are directly translated into movements in the robot. This allows them to perform a task while the robot mimics them, giving the robot the information it needs to ultimately do the same job without human intervention.
There are, of course, many available teleoperation solutions at present. Each comes with its own set of limitations — some require the operator and the robot to be physically present in the same location, some may not be capable of controlling multi-finger dexterous hands, and others have issues with visual obstructions that make it difficult for the operator to work, and that may also prevent the robot from collecting all of the high-quality data that it needs.
Researchers at the University of California San Diego and MIT are collaborating to develop an intuitive and easy to use teleoperation system that can overcome the persistent issues that plague present solutions. Their system, called Open-TeleVision, utilizes virtual reality (VR) goggles to give users an immersive, stereoscopic view from the robot’s point of view. It also has a precise tracking system that enables the robot to closely follow the movements of the operator, even including fine motions of the fingers.
Open-TeleVision has two primary components — perception and actuation. For perception, the user can wear nearly any VR headset, but in this study an Apple Vision Pro headset was utilized. Stereo video is streamed from the robot to the VR headset to give the operator a first-person view from the robot. The camera is able to twist and tilt to change the view as the user moves their head position. By changing the viewing angle, the operator can avoid obstructions that block their view and that would otherwise result in low-quality data being collected.
The arm and hand tracking capabilities of the VR headset are leveraged to determine their positions in three-dimensional space. These calculations then drive the robot control systems to cause its arms and hands to actuate in a way that mimics the movements of the operator.
The researchers tested Open-TeleVision in both the Unitree H1 and Fourier GR-1 humanoid robots. The H1 was equipped with five fully articulated fingers, while the GR-1 has only grippers, which demonstrates the versatility of the system. These setups were used to demonstrate the suitability of Open-TeleVision in collecting data for training imitation learning algorithms. It was shown that operators could effectively perform even complex tasks, like folding towels, sorting cans, and passing items from one hand to the other.
While the system provides an effective teleoperation platform, it was noted that the lack of tactile feedback can make it difficult to perform some very fine manipulations. By addressing shortcomings such as this, Open-TeleVision might eventually prove to be the universal teleoperation solution that can help to move the field of robotics forward.