LLMs Finally Face the Facts
LLM hallucinations are a big problem, but DataGemma models could help to eliminate them by incorporating hard data into their responses.
The remarkable ability that large language models (LLMs) have to generate coherent responses to plain-language queries has taken interest in artificial intelligence applications to previously unheard of levels over the past few years. But despite the way that LLMs fascinate us, they are beginning to look more and more like a really cool solution without a problem. Sure, they are pretty good at summarizing long texts and helping us to understand complex source code, but where is the killer app for LLMs?
There are certainly good possibilities. LLMs could completely upend the entire web search industry, for example. Yet that has not happened. One of the biggest reasons that these algorithms are not living up to their potential is their tendency to hallucinate, which is really just a nice way of saying that they lie with great confidence. So much so, in fact, that they could make a career politician blush. And if you cannot trust the responses you receive from a model, it is hard to build a serious application around it that people will put their trust in.
Show me the data
Google researchers have just announced a new plan that they hope will make LLMs more reliable in the future. They have released a new set of models called DataGemma (which were apparently named by engineering, not marketing). These models are based on the existing Gemma models, which are lightweight algorithms built using the same technology as their top-of-the-line Gemini models. But rather than solely relying on statistics to determine the best way to respond to a prompt, DataGemma models also pull in hard data to shape their answers.
This goal was achieved by integrating a Gemma LLM with Google’s Data Commons. Data Commons is a repository that aggregates all sorts of statistical data — over 240 billion data points — that was collected from public sources, like the Centers for Disease Control and Prevention and the Census Bureaus. By using this data, the models are given a better grounding in reality that helps them to avoid hallucinations.
A tale of two theories
Two approaches were initially taken to handle the integration of LLMs with Data Commons. The first method is called Retrieval-Interleaved Generation, and it works by identifying instances where statistical data is returned in a response, then querying Data Commons to confirm that the statistics provided are correct.
The second approach, termed Retrieval-Augmented Generation, snaps into action before the LLM begins to generate a response. This algorithm retrieves data from Data Commons first, then supplies it to the LLM as contextual information along with the user prompt. The goal of this approach is to reduce the likelihood that hallucinations will occur in the first place, rather than trying to clean things up after the fact.
Swapping out incorrect statistics is certainly an important first step, but that will by no means eliminate all hallucinations. Before we can really put our trust in the results of LLMs, much more work will need to be done. By keeping the models open (although, unfortunately, not truly open source), the team hopes that others will help to improve the technology further over time.