TinyML is getting a lot of attention lately because it allows for the development of very small, low-power machine learning models that can be deployed on resource-constrained devices. By eliminating the large computational and energy requirements of traditional machine learning models that run in the cloud, costs can be slashed dramatically, and issues related to data privacy and latency can also be sidestepped. This opens up a wide range of intelligent algorithms to be used in countless additional use cases. Advancements in microcontroller technology have helped in pushing tinyML forward, but considering that traditional models require many megabytes or gigabytes of memory for execution, and most microcontrollers measure memory in kilobytes, there is still much more that needs to be done.
The software component of this problem is being attacked from many fronts — TensorFlow Lite for Microcontrollers and the Edge Impulse machine learning development platform, for example, produce highly optimized models that can run on very small hardware platforms. A team centered at Newcastle University has recently thrown their hat in the ring with a technique designed to make one particular type of algorithm easier to deploy to resource-constrained devices — reinforcement learning. Reinforcement learning is in many ways especially difficult to bring to tiny hardware platforms because not only does the algorithm need to run inferences, but it also needs to continually learn and adapt itself to new situations by collecting data over time. The framework, called TinyRL, was demonstrated using several common, low-power microcontroller development boards.
In a nutshell, TinyRL works by first running the initial algorithm training process on a computing platform with sufficient resources to run high-level programming environments to streamline the development process. In most cases, a Raspberry Pi single-board computer would suffice for this purpose. Keeping in mind that the ultimate destination for these algorithms is a microcontroller, the team chose to focus on using Q-Learning. This algorithm can be quite accurate and is highly efficient. Q-Learning does not need a model of the environment, making the process a lightweight one. The Q-array that is generated by the training process is then transferred to the microcontroller for use. Since only the Q-array is transferred, it is essential that a functionally identical implementation of the algorithm exists on both the training computer and the tinyML hardware. An extension to the FogML toolkit was created to help ensure this consistency and generate C language code that can be integrated with a microcontroller’s firmware.
TinyRL was evaluated with the help of a virtual environment called the OpenAI Gym. The Cart Pole and Mountain Car virtual environments from the classic control group were selected. The learning process was carried out on a computer with an Intel i7 processor and 16GB of RAM, then the Q-arrays were transferred to a number of different platforms with the help of the Arduino IDE. The NUCLEO-L031K6, NUCLEO-L432KC, Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and ESP32-CAM-MB boards were all evaluated. At most, 52% of flash memory and 16% of RAM were utilized among all of the environment and hardware platform combinations. The slowest algorithm run time on any platform was 261 microseconds, and the best result clocked in at 2 microseconds.
The researchers were able to prove that TinyRL is an effective method for implementing reinforcement learning algorithms on resource-constrained platforms. Next up, they have their sights set on implementing multi-agent on-device reinforcement learning algorithms and exploring the possibility of using mesh communication protocols.