Learning From Others
The Open X-Embodiment dataset was collected from 22 types of robots, and may help pave the path to developing general-purpose robots.
In most cases, machine learning models are trained on relatively small datasets that are highly targeted at achieving a narrow set of goals. That is not because this approach gives rise to the most effective algorithms. On the contrary, as has been highlighted in recent years through the rise of large language models and open-vocabulary image classifiers, training models with massive datasets leads to better performance. It has been shown that models trained on large, diverse datasets frequently outperform narrowly trained models, even in their own areas of expertise.
Nevertheless, smaller datasets are still more frequently leveraged because of the cost and effort associated with collecting and annotating large datasets. The process of acquiring, cleaning, and labeling massive amounts of data is not only resource-intensive but also time-consuming. Moreover, maintaining data quality and ensuring its relevance to the specific problem at hand becomes increasingly challenging as datasets grow larger.
These problems are especially pronounced in robotics, where each style of robot is trained on the tasks it is to perform, and in the environment in which it will operate. In large part because of these constraints, we find ourselves with decidedly unintelligent and underwhelming robots that fall far short of the general-purpose robotic assistants imagined in science fiction. But collecting large quantities of data from a wide range of robot types, with each performing a large set of tasks, is beyond the capabilities of any one group to reasonably achieve.
Fortunately, researchers at Google DeepMind teamed up with partners from 33 academic labs around the world. Together, they assembled the Open X-Embodiment dataset, which consists of data from 22 different robot types. The robots in the dataset performed more than 500 skills and 150,000 tasks in the course of one million episodes. As it stands, the Open X-Embodiment dataset is the largest of its find, and is a big step towards the creation of a generalized machine learning model that can understand and follow a wide range of directions, and work across many styles of robots.
The team at Google DeepMind put this new data to work in training a pair of new models. RT-1-X is a transformer model that was designed for robotic control tasks, while RT-2-X is a vision-language-action model that also incorporates data from the web into its training. These models build on the previously released RT-1 and RT-2 models, respectively, that were trained on more narrow datasets.
Despite having the same architecture as the previous models, the new versions, trained on the Open X-Embodiment dataset were found to perform much better. RT-1-X was even found to beat purpose-built models at their own game, with a 50% higher average success rate being observed by researchers at the partner institutions when testing common tasks, like opening a door.
What’s more, it was found that robots could learn to do things that they had never been trained to do. These emergent skills were learned because of the knowledge encoded in the vast range of experiences captured from other types of robots. In their experiments, the Google DeepMind team found this to be especially true when it came to tasks that require a better spatial understanding.
The researchers have open-sourced both the dataset and the trained models in the hope that others will continue to build on their work. They believe that collaboration of this kind will be a key factor in ultimately achieving the goal of building general-purpose robots.