Reduce, Reuse, Retrain

Growing large machine learning models from smaller ones can substantially reduce training time and costs.

Nick Bild
1 year agoMachine Learning & AI
Growing a machine learning model to save time and money (📷: P. Wang et al.)

Machine learning algorithms are becoming more widely used in a variety of industries, as the benefits they offer become more evident. However, training larger models is no easy feat, and that means most businesses, researchers, and hobbyists are out of luck if they want to experiment with them. The costs associated with the computational resources that are needed, as well as the time and expense that goes into gathering a massive training dataset, can easily run into the millions of dollars for a single training run. And if some tweaks need to be made after that first run, well, you can see where this is going!

There is widespread concern among those in the field, because this lack of accessibility of large models stifles innovation. How advanced might large language models be today if building them was not limited to a relatively small number of organizations? A research effort led by engineers at the University of Texas at Austin and MIT may help to open up the technology to a wider audience so that we no longer need to ask such questions. They have developed a method that reuses smaller, existing machine learning models as a starting point in training a new, larger model. In doing so, the cost and time associated with model training can be dramatically reduced.

Smaller models may not have all the capabilities that you are looking for, but there is still a tremendous amount of knowledge encoded in them. So rather than discard these older models, the team devised a plan to recycle them. These models are grown to larger sizes by adding additional neurons to existing layers, and even adding entirely new layers to the model. The question, then, is how best to incorporate the previous model’s weights into the new model structure.

For this purpose, the researchers built a machine learning pipeline that can learn a linear mapping of the parameters of the existing model. Using this method, termed a Linear Growth Operator (LiGO), the parameters of the smaller model are mapped into a larger network of greater width and depth. The result is a new, larger model that already has a vast amount of knowledge encoded in it.

This means that the training process does not need to start from scratch with random parameter weights. Accordingly, models created with LiGO were found to require about 50 percent less computational resources to train for both vision and language tasks. The amount of training data that would typically be required is also reduced, adding to the savings.

There was no compromise involved in using LiGO. As it turned out, models trained with this approach performed as well as, or better than, models trained from scratch.

The results of this research are certainly encouraging, however, there is still a long way to go before these technologies have truly been democratized. A savings of 50 percent is massive — but so is the traditional cost, which leaves large models still out of the reach of most, even with this boost.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles