The MinUn TinyML Framework Squeezes Machine Learning Models Onto Resource-Light Microcontrollers

The first "holistic" machine learning framework, MinUn aims to beat TensorFlow Lite and others on size and accuracy.

A team from Microsoft Research in India, ETH Zurich, an the University of California at Berkeley has unveiled what is claimed to be "the first tinyML framework to holistically address" issues preventing the generation of efficient code for microcontrollers — and in doing so beat rivals including TensorFlow Lite.

"Running machine learning inference on tiny devices, known as tinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for," the researchers claim in the abstract to their paper. "A deployment framework for tinyML must be: A) Parametric in the number representation to take advantage of the emerging representations like posits; B) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy; and C) avoid memory fragmentation."

It's these issues, the team claims, that prevent running certain models on extremely-constrained devices like low-power microcontrollers — where even the 600kB of memory required by the highly-efficient face-detection model RNNPool is far too much, much less the 3MB you'd need for MobileNetV2-SSDLite.

The proposed solution is MinUn, a framework for tinyML that is developed to offer a "holistic" approach to the three key sub-problems: the need to use number representations, which can approximate 32-bit floating point numbers but in a reduced number of bits without a loss of accuracy; the need to heuristically select bitwidth assignment to minimize memory usage while maintaining that accuracy; and the lack of memory management capabilities on resource-constrained microcontrollers leading to potential memory fragmentation issues.

"MinUn is the first tinyML framework which is parametric over any arbitrary number representation," the researchers claim. "For the bitwidth assignment problem, we propose a novel exploration algorithm, HAUNTER, which uses both accuracy and size to produce better assignments. Finally, for RAM management, MinUn encodes the memory management problem to a bin-packing problem and solves it using [Donald] Knuth's Algorithm X, which is guaranteed to return the optimum result — albeit in exponential time. Here, our main contribution is to come up with an effective encoding and to adapt the general framework of Algorithm X to ensure a tractable runtime in practice."

The results are undeniably impressive: the 600kB memory requirement of the RNNPool model is reduced to 180kB, without a loss in performance — allowing it to squeeze onto microcontrollers with just 256kB of RAM. For other models, the results are even more impressive. The SqueezeNet convolutional neural network went from 4.42MB of RAM using 32-bit floating-point precision to requiring just 1.16MB under MinUn — slightly higher than the 1.11MB a TensorFlow Lite variant required, but with a near nine percentage point advantage in accuracy.

The paper describing MinUn is available as a preprint on Cornell's arXiv server, while the project's source code has been made available on GitHub under the permissive MIT license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles