Don't Trust Doctor LLM: Researchers Warn of Medical Misinformation Data-Poisoning Vulnerability
A replacement for existing benchmarks could catch around nine-tenths of the harmful content, though.
Researchers from New York University, NYU Langone Health, Washington University, Columbia University Vagelos, Harvard Medical School, and the Tandon School of Engineering have warned of serious data-poisoning vulnerability in large language models β making them return erroneous answers for medical queries based on the replacement of only a tiny amount of training tokens.
"The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge," the researchers explain by way of background. "Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation."
The explosive rise of large language models, trained at great computational expense on often unauthorized troves of copyright data to respond to natural-language prompts with tokens matching the most likely "answer," will not be news to anyone. What started as novel chatbots, spiritual successors to the original Eliza, are being integrated into platforms at a rapid pace - but, as research has shown, should be approached with care, having no innate understanding of the data ingested nor ability to ascertain between fact and fiction in either their training data or their own output.
When the models are being used for medical queries, that's a particular problem. "Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development," the team explains of its experiment. "We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs."
In other words: it's very easy for an LLM to be led astray, either by being fed accidentally false information or by active attach β and current approaches to test their outputs aren't enough to protect their users. That, thankfully, is where the team's work comes in: "Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content," the researchers claim.
"Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs," the researchers continue. "In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety."
The team's work has been published in the journal Nature Medicine under open-access terms.