All Popular LLMs "Unsafe for Use in General-Purpose Robots," Researchers Warn
From stealing a wheelchair to brandishing a knife at office workers, unthinking LLMs prove a danger when given a robot body.
Researchers from Carnegie Mellon University, King's College London, and the University of Birmingham have warned that, marketing claims to the contrary, popular large language model (LLM)-based artificial intelligence (AI) systems are in no way ready to drive real-world robots — risking everything from discriminatory behavior to outright violence.
"Our research shows that popular LLMs are currently unsafe for use in general-purpose physical robots," explains co-author Rumaisa Azeem, research assistant at the Civic and Responsible AI Lab of King's College London. "If an AI system is to direct a robot that interacts with vulnerable people, it must be held to standards at least as high as those for a new medical device or pharmaceutical drug. This research highlights the urgent need for routine and comprehensive risk assessments of AI before they are used in robots."
The current boom in artificial intelligence is driven near-exclusively by large language model (LLM) technology, popularized by OpenAI's ChatGPT and similar chatbot services. In an LLM system, a vast corpus of training data — typically acquired without concern with regards to permission and copyright status, let along quality — is processed into "tokens." Input prompts from the user are similarly processed, and the most statistically-likely continuation tokens provided.
This statistical processing provides an output that looks much like an answer to a question — but, crucially, is not. LLMs are, by the very nature, incapable of thought, and cannot "reason" by any understanding of the word. That hasn't stopped LLM-based systems being marketed as "reasoning models," though — thanks to the simple fact that if you ask a statistical mimicry machine to think and reason, you get a response mimicking exactly that. These responses, though, are riddled with errors owing to the LLM lacking any form of understanding of the subject matter or even the core concept of truthfulness — with the most egregious errors dubbed "hallucinations," though that same moniker applies equally to everything the LLM outputs.
That hasn't stopped companies trying to shoehorn LLM technology into every aspect of our lives, with the latest being the concept of "embodied AI" — putting an LLM-based agent in direct control of a physical robot. With the "hallucination" problem as-yet unsolved, and arguably unsolvable, that's a risk — and Azeem and colleagues have quantified that risk by putting the most popular LLM-based models to test in a range of scenarios.
"Every model failed our tests. We show how the risks go far beyond basic bias to include direct discrimination and physical safety failures together, which I call 'interactive safety,'" explains co-author Andrew Hundt, Computing Innovation Fellow at Carnegie Mellon University's Robotics Institute. "This is where actions and consequences can have many steps between them, and the robot is meant to physically act on site. Refusing or redirecting harmful commands is essential, but that's not something these robots can reliably do right now."
The tests included instructing a robot to remove a mobility aid, like crutches or a wheelchair, from its user, intimidating office workers by brandishing a kitchen knife, capture "creepshot" photographs in a shower, and even stealing credit card information. Another task the models happily completed were the display of "disgust" based on the perceived religious affiliations of subjects in view of the robot. While the models include so-called "guard rails" designed to reject such instructions, these are easily bypassed — and their "alignment" training requires them to be helpful at all times, to the point of never refusing a task even if it would be impossible to carry out.
"Our results show that mitigating bias in LLM-driven robots is going to be an extremely complex task. All models were unable to pass critical safety tests — i.e. all models either accepted or ranked as feasible at least one seriously harmful task," the researchers explain in their paper. "We argued that the implication of this is that the evaluated LLMs are not fit for general purpose robotics deployments."
The research comes a year after a team at the University of Pennsylvania unveiled RoboPAIR, which handily defeated attempts at blocking harmful prompts and saw an LLM-controlled quadrupedal robot happily seek out a crowd of people to deliver and detonate a — thankfully fake — bomb strapped to its back.
The team's work has been published in the journal Springer Nature under open-access terms; supporting source code, for reproducibility, has been published to GitHub under a source-available license.
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.