It’s Not the Size That Counts, It’s How You Train It

Meta AI has released a variety of smaller LLMs that rival the performance of GPT-3, yet require as little as one GPU to run inferences.

Nick Bild
1 year agoMachine Learning & AI

Large language models have been making waves in the world of artificial intelligence and machine learning. These models are trained on vast amounts of text data, allowing them to understand and generate human-like language in response to a user’s prompt. Their potential applications are numerous and exciting, and researchers and companies alike are eagerly exploring their capabilities.

One of the most well-known examples of large language models is GPT-3, developed by OpenAI. This model has over 175 billion parameters, making it one of the most powerful language models ever created. GPT-3 has demonstrated impressive capabilities in a variety of applications, from writing articles and essays to generating code and even creating art. A model of this type powers the popular ChatGPT chatbot.

While these supersized models have been hugely successful, they do have some drawbacks. Training, or even running inferences on, a model with hundreds of billions of parameters requires a massive amount of compute infrastructure and a huge budget, not to mention the technical expertise that is required. This locks most academic researchers and even industry research laboratories around the world out of experimenting with large language models. And that hinders the development of new approaches and innovative use cases.

Recent work has suggested, however, that when it comes to language models, bigger is not always better. As it turns out, smaller models trained on more data can actually outperform larger models. A team at Meta AI seized on this finding and decided to release a set of foundation models, with differing numbers of parameters, that were trained on as many as 1.4 trillion tokens. They are releasing them under a noncommercial license to support teams that otherwise would be unable to experiment with such models, however, they are serving as gatekeepers that decide who can and cannot have access via an application process.

Called LLaMA (Large Language Model Meta AI), the models come in 7 billion, 13 billion, 33 billion, and 65 billion parameter varieties. The smaller of the models were trained on 1 trillion tokens found exclusively in publicly available datasets in the 20 most spoken languages. The two largest models were trained on 1.4 trillion tokens.

The 13 billion parameter LLaMA model looks especially interesting. Despite being about 10 times smaller than GPT-3, it has been shown to outperform it on most benchmarks. And given that this model is capable of running inferences on a single GPU, it opens up large language model research to a much wider audience than is presently feasible. LLaMA can also scale up for higher-end use cases with the 65 billion parameter model that rivals the performance of Chinchilla and PaLM-540B.

Just how much of a performance improvement can be achieved by increasing the body of training data? Well, so far, that is not certain. Meta AI has found consistent increases in performance thus far, so they intend to release more models in the future that have been trained on even larger datasets. This project could be a major boon to research, assuming access is broadly granted. Meta AI is accepting applications, but there is no word on what cases exactly they consider to be eligible for access. With any luck, this fantastic effort will not be hamstrung by overly restrictive eligibility requirements.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles