Microsoft Puts OpenAI's ChatGPT to Work Controlling Real-World Robot Arms, Drones, and More

Designed to make robotics more accessible to non-technical users, Microsoft's latest project uses a large language model as an interface.

A team from Microsoft's Autonomous Systems and Robotics Research division has penned a paper detailing how OpenAI's ChatGPT, a large language model currently used primarily for search summation and interactive chat projects, could be used to control robotic devices in a more approachable and natural way.

"Have you ever wanted to tell a robot what to do using your own words, like you would to a human? Wouldn’t it be amazing to just tell your home assistant robot 'please warm up my lunch,' and have it find the microwave by itself," the researchers explain of their goals. "Even though language is the most intuitive way for us to express our intentions, we still rely heavily on hand-written code to control robots. Our team has been exploring how we can change this reality and make natural human-robot interactions possible using OpenAI's new AI language model, ChatGPT."

OpenAI's ChatGPT large language model could let non-technical users better make use of robots, Microsoft researchers have claimed. (📹: Vemprala et al)

Launched in November last year, though only recently being considered stable, OpenAI's ChatGPT builds on the company's GPT-3 family of large language models and provides a surprisingly responsive conversational experience. Since its release, there has been considerable interest in using the model to provide a more natural way to interact with technology — though Microsoft's attempts to integrate it into the Bing search engine have not been without their issues.

"It turns out that ChatGPT can do a lot by itself, but it still needs some help," the researchers admit. "Our technical paper describes a series of design principles that can be used to guide language models towards solving robotics tasks. These include, and are not limited to, special prompting structures, high-level APIs, and human feedback via text."

In testing, the system proved capable of surprisingly sophisticated control of a physical drone. (📹: Vemprala et al)

In the paper, the team describes a methodology and design principles towards the creation of prompts — the user-crafted input which is fed to the language model in order to steer it towards a particular output — based around a high-level robotics function library. The prompts steer ChatGPT to creating code based on the library, and that can be tweaked by further user prompts until it's ready for deploying to the robot.

"We gave ChatGPT access to functions that control a real drone, and it proved to be an extremely intuitive language-based interface between the non-technical user and the robot," the team claims of its experimentation. "ChatGPT asked clarification questions when the user’s instructions were ambiguous, and wrote complex code structures for the drone such as a zig-zag pattern to visually inspect shelves. It even figured out how to take a selfie!

"We also used ChatGPT in a simulated industrial inspection scenario with the Microsoft AirSim simulator. The model was able to effectively parse the user's high-level intent and geometrical cues to control the drone accurately."

The team's experiments included control of a real-world robot arm, which manipulated blocks into the Microsoft logo. (📹: Vemprala et al)

Other experiments demonstrated ChatGPT's ability to manipulate a robot arm, building a Microsoft logo from wooden blocks, and to create an algorithm for a drone to traverse a space while avoiding obstacles based on input from a front-facing distance sensor. "This task required some conversation with the human," the team explains, "and we were impressed by ChatGPT’s ability to make localized code improvements using only language feedback."

A key result of the project, the researchers say, is a new platform specifically for collaboration on prompts for robotic control via large language models: PromptCraft. "PromptCraft [is] a collaborative open-source platform where anyone can share examples of prompting strategies for different robotics categories," the team explains, "We release all of the prompts and conversations used in this study. We invite the readers to contribute with more!"

The full paper is available to download from the Microsoft website now; PromptCraft for Robotics is hosted on GitHub, where prompts are published under the permissive MIT license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles