Computer Vision in 512 Bytes
The NeuralNetwork library lets an 8-bit ATtiny85 run an RNN that performs handwritten digit recognition with just 512 bytes of memory.
If you keep up on the latest advances in AI, then you might have gotten the impression that bigger is better. More parameters! Larger context windows! Faster, more powerful hardware! It's true that the trend in state-of-the-art AI, particularly with large language models, has been toward scaling up the models to achieve better performance and more generalized capabilities.
However, when real-time operation is required, or when protecting privacy is a big concern, these massive, cloud-based models aren’t where it’s at. In these cases, edge AI is the way to go. And when running these models locally, smaller is better. Smaller models can run on less expensive hardware, and, all else being equal, they can do their jobs faster. But AI algorithms are complex, so just how small can they get?
GitHub user GiorgosXou wanted to find out and developed a library for running neural networks on microcontrollers to support that effort. Simply called NeuralNetwork, this library was designed to be simple and easy to use. To shrink models down to size, it employs a number of techniques, like quantization, SIMD acceleration, and custom activation functions. And no, it is not just a way to run a basic feedforward network — it supports advanced architectures like RNNs, GRUs, and LSTMs.
GiorgosXou recently put this library to the test on the extremely modest 8-bit ATtiny85 microcontroller to see what it is capable of. With just 512 bytes of RAM and EEPROM storage, there isn’t much to work with. Even still, GiorgosXou managed to get a complete RNN trained on the MNIST dataset running on this chip. Believe it or not, using NeuralNetwork, the ATtiny85 is running a computer vision model that recognizes handwritten digits.
The neural network takes in an array of bytes representing an image of a handwritten digit. After feeding these bytes through the model, it will predict if the image represents a digit between zero and nine.
It is true that MNIST-trained handwritten digit recognition models are something you might find in a beginner's tutorial for a traditional machine learning development framework. It is an awful long way from a really useful computer vision model of the sort that might, for instance, power a self-driving vehicle. But all the same, getting this to run on an ATtiny85 is a big accomplishment. Just imagine what could be done by loosening the constraints and working with something along the lines of an ESP32. The future of tinyML is looking brighter all the time.