Researchers Grant Robots "Planning in Imagination" with Dreamer, Enable Simulation-Free Learning
Designed to do away with imprecise simulation in favor of real-world physical learning, DayDreamer shows real promise.
A team of researchers from the University of California, Berkeley has experimented with "planning in imagination" to enable robots to learn to perform tasks from walking to pick-and-place without prior training or simulation using the Daydream machine learning algorithm.
While the Dreamer algorithm had already been proven suitable for teaching robots how to move, it had previously only been used with a simulation stage — something the researchers sought to remove. "The problem is your simulator will never be as accurate as the real world," co-author Danijar Hafner explains in an interview with MIT Technology Review on the subject. "There'll always be aspects of the world you’re missing."
In this latest work, DreamerV2 is applied directly to physical robots who — with no prior experience of their own capabilities nor the world around them — and without the benefit, limited though its accuracy may be, of a simulation step.
Perhaps the most impressive experiment is in running DreamerV2 atop a Unitree A1 quadruped robot body, which begins on its back with feet in the air. Without the benefit of simulation nor any resets, the robot learns to roll over, stand up, and eventually walk in just one hour of experimentation — contrasted with a traditional reinforcement learning approach that failed to get further than rolling over, despite the intervention of the scientists to free it from a deadlock.
In another experiment, a visual pick-and-place robot built around a Universal Robotics UR5 hits 2.5 objects-per-minute after eight hours of training — greatly exceeding its rival algorithms, which had only learned to pick up and immediately drop objects in the same time, and approaching the performance of a human operator. Switching to the lower-specification XArm and extending the training period to 10 hours boosted the pick rate to 3.1 objects per minute, comparable to a human operator.
A last experiment saw the team running DreamerV2 on a Sphero Ollie free-roaming robot, giving the spherical machine a goal to reach based on visual data only — though this time the performance was roughly equivalent to the best rival algorithm, the model-free DrQv2 designed specifically for pixel-input continuous control.
Although the team's approach seems promising, the researchers do warn of some limitations — most specifically that while real-world learning is demonstrably more valuable than simulator time, it puts wear and tear on potentially expensive hardware that could be avoided by performing the bulk of training in simulation.
The team's work is available as a preprint on Cornell's arXiv server, with more information available in an interview with MIT Technology Review. Hafner has pledged to release the source code for the experiments, too, but at the time of writing the official project website had not been updated with a link to the repository.