Spill The Beans Leans on Cache Side-Channel Attacks to Leak Secrets From Large Language Models
"Spill The Beans" attack could have a serious impact on those using shared resources for LLM operation.
Security researchers from the MITRE Corporation and Worcester Polytechnic Institute have warned that side-channel attacks on modern CPUs can let a local attacker spy on your conversations with large language models (LLMs) β recovering up to 90 percent of a high-entropy secret key in a single shot.
"Side-channel attacks on shared hardware resources increasingly threaten confidentiality, especially with the rise of Large Language Models (LLMs)," explains researchers Andrew Adiletta and Berk Sunar. "In this work, we introduce Spill The Beans, a novel application of cache side-channels to leak tokens generated by an LLM. By co-locating an attack process on the same hardware as the victim model, we flush and reload embedding vectors from the embedding layer, where each token corresponds to a unique embedding vector. When accessed during token generation, it results in a cache hit detectable by our attack on shared lower-level caches."
At the heart of the current artificial intelligence bubble, large language models (LLMs) underpin the majority of user-facing "AI" implementations to date: turning inputs into tokens then returning the most statistically-likely continuation tokens in response, which appears to the user as something shaped very much like an answer β though without any guarantee of correctness or basis in fact.
On top of that particular problem comes Spill The Beans, which targets conversations between a victim and an LLM running on hardware shared with the attacker by exploiting well-understood side-channel vulnerabilities in modern processors β in this case, monitoring accesses to the system cache.
"Through extensive experimentation, we demonstrate the feasibility of leaking tokens from LLMs via cache side-channels," the pair explain. "Our findings reveal a new vulnerability in LLM deployments, highlighting that even sophisticated models are susceptible to traditional side-channel attacks. For proof of concept we consider two concrete attack scenarios: our experiments show that an attacker can recover as much as 80%-90% of a high entropy API [Application Programming Interface] key with single shot monitoring. As for English text we can reach a 40% recovery rate with a single shot. We should note that the rate highly depends on the monitored token set and these rates can be improved by targeting more specialized output domains."
For those running LLMs on shared hardware and concerned about the impact of Spill The Beans, the researchers have some advice for mitigation: temporal and spatial randomization of memory access patterns, injecting random read operations for presently-unused tokens to mask the real accesses, and hardware-based isolation and partitioning, including Intel's Cache Allocation Technology (CAT).
"Spill The Beans serves as a reminder that the intersection of modern hardware design and AI models introduces new and subtle risks," the researchers conclude. "To safeguard confidential interactions and private intellectual property, the community must pursue comprehensive hardware-software co-design solutions, isolation techniques, and adaptive obfuscation strategies that can effectively counter the evolving landscape of microarchitectural side-channels."
The team's work is available under open-access terms on Cornell's arXiv preprint server.