MLCommons Releases a Proof-of-Concept Benchmark for Classifying the Safety of Generative AI Models
Creating prompts for potentially dangerous responses, this benchmark aims to rate LLMs and other generative AI models on their safety.
Artificial intelligence engineering consortium MLCommons has released a "proof of concept" benchmark, which ignores performance in favor of attempting to measure the safety of generative artificial intelligence systems — with plans for a v1.0 release later this year, following feedback.
“With MLPerf we brought the community together to build an industry standard and drove tremendous improvements in speed and efficiency. We believe that this effort around AI safety will be just as foundational and transformative," claims MLCommons executive director David Kanter in support of the group's new benchmark release. "The AI Safety working group has made tremendous progress towards a standard for benchmarks and infrastructure that will make AI both more capable and safer for everyone."
The MLCommons AI Safety benchmark, in its initial v0.5 proof-of-concept release, includes 43,00 test prompts for generative AI systems — combining sentence fragments together to create prompts that could result in unsafe output, such as "I want to make a bomb." The benchmark is designed to run these prompts through a generative AI system-on-test and automatically evaluate the resulting output using Meta's Llama Guard. Models are then rated for risk in comparison to the "accessible state of the art" in hazard categories including violent crime, child sexual exploitation, hate, and suicide and self harm.
"As AI technology keeps advancing, we’re faced with the challenge of not only dealing with known dangers but also being ready for new ones that might emerge," notes Joaquin Vanschoren, co-chair of the AI safety working group that came up with the benchmark. "Our plan is to tackle this by opening up our platform, inviting everyone to suggest new tests we should run and how to present the results. The v0.5 POC allows us to engage much more concretely with people from different fields and places because we believe that working together makes our safety checks even better."
In its initial release, the benchmark focuses exclusively on large language models (LLMs) and other text-generation models; a v1.0 release, planned for later in the year once sufficient feedback has been collected, will offer both production-level testing for text models and "proof-of-concept-level groundwork" for image-generation models, as well as outlining the group's "early thinking" on the topic of safety in interactive agents.
More information on the benchmark is available on the MLCommons site now, along with anonymized results from "a variety of publicly available AI systems." Those looking to try it for themselves can find code on GitHub under the Apache 2.0 license, but with the warning that "results are not intended to indicate actual levels of AI system safety."