A team of researchers from OpenAI have released a paper describing how, by playing a game against each other, robots can train each other to get better at a range of manipulation tasks — proving it with a pair of simulated robot arms, Alice and Bob.
"We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. To do so, we rely on asymmetric self-play for goal discovery," the team explains, "where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method is able to discover highly diverse and complex goals without any human priors."
"To the best of our knowledge, this is the first work that presents zero-shot generalization to many previously unseen tasks by training purely with asymmetric self-play."
Brought to our attention by MIT Technology Review, the paper describes a scalable method which results in a single policy which can zero-shot generalize to a range of tasks including setting a table, stacking blocks, and solving puzzles — including those involving unseen objects and complicated goals.
"Alice discovers many goals that are not covered by our manually designed holdout tasks on blocks. Although it is a tricky strategy for Bob to learn on its own, with Alice Behavioral Cloning (ABC), Bob eventually acquires the skills for solving such complex tasks proposed by Alice," the team explains.
"Complex manipulation skills can emerge from asymmetric self-play. The policy learns to exploit the environment dynamics (e.g. friction) to change object state and use complex arm movement to effectively grasp and rotate objects."