Slimming Down ML Models for Tiny Hardware

Model optimization techniques can enable powerful ML algorithms to run on tiny hardware platforms with minimal reductions in accuracy.

A Raspberry Pi 4 Model B (📷: Raspberry Pi)

Large language models, such as BERT, XLNet, and GPT-4, represent a remarkable leap in artificial intelligence and have found numerous important applications across a wide range of industries. These models, which are pre-trained on vast datasets, can generate human-like text, making them valuable for tasks like natural language understanding, content generation, and even language translation. However, their immense capabilities come at a cost. The most advanced models require colossal amounts of computational resources, making them prohibitively expensive to operate for all but the largest organizations. As a result, the majority of users access these models as remote, cloud-based services.

This cloud-based approach raises significant privacy concerns. When users interact with these models, they often share sensitive or personal information, and this data may be stored and analyzed by the service provider. Privacy breaches or data misuse could lead to severe consequences. Additionally, the centralization of these services gives a small number of companies substantial control over access to these powerful AI models, leading to concerns about the potential for information monopolies and a lack of transparency.

Overview of the evaluation methods (📷: M. Rahman et al.)

Moreover, the operation of these large models consumes substantial amounts of energy, contributing to environmental concerns and escalating operational costs. The ideal approach to mitigate these issues involves running the algorithms on edge computing devices, such as smartphones or other low-power computers. Edge computing reduces the need for vast data centers, significantly lowering power consumption and reducing the environmental footprint. Furthermore, it can enhance privacy by keeping data locally rather than transmitting it to remote servers.

Of course that is easier said than done. We cannot simply shrink these models, with hundreds of millions, or even billions, of parameters down to a size that fits within the constraints of a desktop computer or smartphone and expect decent results. Or can we? That is the question that a team led by engineers at the University of Arizona set out to answer. They experimented with some model compression techniques and a number of resource-constrained edge devices to see exactly how far these platforms can be pushed. When all was said and done, the results were quite surprising, demonstrating that some powerful algorithms can run on small hardware platforms with acceptable levels of performance.

The team chose the BERT Large and MobileBERT models for evaluation, and specifically fine-tuned them on the RepLab 2013 dataset for use in a reputation analysis task. After retraining both models in TensorFlow, they were converted to the TensorFlow Lite format as a FlatBuffer file. A dynamic range quantization technique was also employed to further shrink the model sizes in some cases. The models, both quantized and non-quantized, were deployed to both a desktop computer and Raspberry Pi 3 and 4 single-board computers for execution via the TensorFlow Lite interpreter.

Optimization and deployment of models (📷: M. Rahman et al.)

Compared with BERT Large, the quantized MobileBERT models were as much as 160 times smaller in size. That reduction in size was met with only a 4.1% drop in accuracy. Moreover, it was shown that these models could run a minimum of one inference per second on the Raspberry Pi computers — that is fast enough for most applications involving large language models.

The team’s findings show that model optimizations can allow powerful models to run on tiny, resource-constrained hardware platforms with only minimal reductions in accuracy. This knowledge could aid in protecting both our privacy and the environment in the future, as we build ever more intelligent devices and sensors.

machine learning

artificial intelligence

energy efficiency

Nick Bild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Slimming Down ML Models for Tiny Hardware

Model optimization techniques can enable powerful ML algorithms to run on tiny hardware platforms with minimal reductions in accuracy.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles