Google DeepMind Scientists Teach Robots New Tricks with a "Language-to-Reward System"

By having a carefully-prompted large language model (LLM) spit out reward code, it's possible to turn commands into low-level control.

Researchers from Google's DeepMind machine learning arm have come up with a new approach to teaching a robot new skills using natural language instructions — augmenting large language models (LLMs) with reward functions tailored to low-level actions.

"Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications," research scientists Wenhao Yu and Fei Xia say, in a joint blog post on the work. "For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences."

Google DeepMind researchers have come up with a way to work around LLMs lack of training on low-level robotic control systems: a reward system. (📹: Yu et al)

"The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal," the researchers continue. "Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents."

There's a "but" coming, of course, and it's a big but: large language models, like ChatGPT, build responses by calculating the most likely following word as it constructs a sentence. It works surprisingly well for responses for which it has a large sample in its training data, but not so well otherwise — and low-level robot control wasn't a focus in LLM training, which prioritized human languages over robotic ones. Add into that the fact that low-level control systems vary from vendor to vendor and robot to robot, and you've got a problem.

It's a problem the DeepMind team believes it has solved with what it calls a "language-to-reward system" designed to sit alongside the LLM. "We posit that reward functions provide an ideal interface for such tasks given their richness in semantics, modularity, and interpretability," Yu and Xia explain.

"They also provide a direct connection to low-level policies through black-box optimization or reinforcement learning (RL). We developed a language-to-reward system that leverages LLMs to translate natural language user instructions into reward-specifying code and then applies MuJoCo MPC to find optimal low-level robot actions that maximize the generated reward function," the pair continue, referring to an existing open-source framework for real-time predictive robot control.

Using a reward translator and a motion controller, the resulting language-to-reward system can take natural-language instructions like "stand up on two feet," process them through the reward translator into reward code for processing, and then pass it to a motion controller to have the robot respond as expected. In testing, the team found that the approach succeeded on 90 percent of experimental tasks.

A preprint of the team's work is available on Cornell's arXiv server under open-access terms; more information is available on the project website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles