Optimizing Transformers for Tiny Tech
TinyFormer offers a three-step process to optimize powerful transformer models for execution on resource-constrained microcontrollers.
As IoT devices become more common in a variety of industries, the integration of tinyML techniques has opened up a wide range of possibilities for deploying machine learning models on low-power microcontrollers. While existing applications such as keyword spotting and anomaly detection have demonstrated the potential of tinyML in these constrained settings, deploying more computationally intensive models remains a significant challenge that limits the overall utility of these devices.
TinyML has a number of benefits that are tailored to the limitations of resource-constrained hardware. By allowing machine learning algorithms to be implemented on microcontrollers with limited memory and processing capabilities, tinyML enables real-time data analysis and decision-making at the edge. This combination not only improves the efficiency of data processing in IoT devices, but also minimizes latency, ensuring that timely and effective insights are derived from the collected data.
However, to use cutting-edge algorithms like transformer models, which are essential in fields such as computer vision, speech recognition, and natural language processing on microcontrollers, we must first address the existing technical challenges. A group of researchers at Beihang University and the Chinese University of Hong Kong have recently put forth a method that enables transformer models to run on common microcontrollers. Called TinyFormer, it is a resource-efficient framework to design and deploy sparse transformer models.
TinyFormer is composed of three distinct steps: SuperNAS, SparseNAS, and SparseEngine. SuperNAS is a tool that automatically finds an appropriate supernet in a large search space. This supernet — which represents many possible versions of the model — allows for the efficient exploration and evaluation of a wide range of possible architectures and hyperparameters within a single framework. This supernet is then trimmed down by SparseNAS, which seeks to find sparse models with transformer structures embedded within it. In this step, sparse pruning is performed on the convolutional and linear layers of the model, after which an INT8 quantization is performed on all layers. This compressed model is then optimized and deployed to a target microcontroller using the SparseEngine phase of the process.
To test this system, ResNet-18, MobileNetV2, and MobileViT-XS models were trained for image classification using the CIFAR-10 dataset. TinyFormer was then leveraged to compress and optimize the model before deploying it on an STM32F746 Arm Cortex-M7 microcontroller with 320 KB of memory and 1 MB of storage space. A 731,000 parameter model that required 942 KB of storage space was found to have a very impressive average classification accuracy rate of 96.1%.
The runtime performance of the system was also evaluated. In particular, SparseEngine was compared with the popular CMSIS-NN library. SparseEngine was found to outperform CMSIS-NN in terms of both inference latency and use of storage space. Acceleration of inference times was measured as being 5.3 to 12.2 times faster. And as for storage space, reductions of 9% to 78% were observed using SparseEngine.
Maintaining high levels of accuracy while reducing the computational workload of an algorithm is a difficult task, but TinyFormer has been proven to do an admirable job. By maintaining a delicate balance between efficiency and performance, this system looks poised to enable many new tinyML applications in the near future.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.