Sim-Trained Zero-Shot Reinforcement Learning Gets This Humanoid Robot Walking Smoothly

Trained using reinforcement learning in simulation, this bipedal robot can adapt to any terrain with no real-world training necessary.

Researchers from the University of California at Berkeley have come up with a reinforcement learning approach that aims to help bipedal humanoid robots get around the real world — teaching natural walking behaviors which adapt to various environments.

"Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets," the research team writes in its paper. "Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion."

Researchers have shown off a new controller for humanoid robots that can be deployed zero-shot with no real-world training. (📹: Radosavovic et al)

The team's approach is based on reinforcement learning, training a causal transformer that predicts the best next action based on an ongoing history of the robot's prior movements and observations following training on a data set including "thousands of randomized environments." Key the the success of the approach is that all training took place in simulation — and when the controller was deployed in-the-field, on an Agility Robotics Digit humanoid robot, it was able to operate in a zero-shot fashion in the real world.

"The terrains varied considerably in terms of material properties, like concrete, rubber, and grass, as well as conditions, like dry under the afternoon sun and damp in the early morning," the researchers write of the real-world testing. "The terrain properties found in the outdoor environments were not encountered during training. We found that our controller was able to walk over all of the tested terrains reliably, and we were comfortable deploying it without a safety gantry. Over the course of 1 week of full-day testing in outdoor environments, we did not observe any falls."

The researchers discovered the controller making "emergent" changes to the robot's gait, terrain-dependent. (📹: Radosavovic et al)

One interesting result of testing was the discovery of emergent gait changes, in which the controller would adjust the robot's walking behavior based on terrain it encountered. "Specifically," the researchers claim, "it started by normal walking on flat ground, transitioned to using small steps without lifting its legs to the normal height on downward slope, and then returned to normal walking on flat ground again. These behavior changes were emergent and not prespecified."

The team's work has been published in the journal Science Robotics under open-access terms.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles