A team of computer scientists have developed a RISC-V-based platform for embedded machine learning workloads — and say it offers 65 times the performance and 37 times the energy efficiency of the same workload running on an STMicroelectronics STM32 microcontroller.
"In the last few years, research and development on deep learning models and techniques for ultra-low-power devices — in a word, tinyML — has mainly focused on a train-then-deploy assumption," the researchers explain in the abstract to their paper, "with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning."
"Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle, but so far they have still been too computation and memory-hungry for ultra-low-power tinyML devices, which are typically based on microcontrollers."
The team's work to address this takes two forms. The first is a rethinking of the underlying algorithm, which is designed to reduce the memory requirement with either effectively-lossless compression or a five per cent accuracy drop for even bigger space savings. The second is a RISC-V processor, based on the Parallel Ultra-Low Power (PULP) Platform, optimized for the workload.
The prototype processor, dubbed VEGA and built on a 22nm process, shows just how big the gains can be. With access to 64MB of memory, the part proved 65 times faster than an STM32 L4 microcontroller and offered 37 times the energy efficiency. Based on a retraining rate of once per minute, that gave the VEGA a 535 hour runtime from a 3.3Ah battery — extendable to 200,000 hours if the retraining rate is pushed to once per hour.
"These results constitute an initial step towards moving the tinyML from a strict train-then-deploy approach to a more flexible and adaptive scenario, where low power devices are capable to learn and adapt to changing tasks and conditions directly in the field," the team concludes. "Despite this work focused on a single CL method, we remark that, thanks to the flexibility of the proposed platform, other adaptation methods or models can be also supported, especially if relying on the back-propagation algorithm and CNN primitives, such as convolution operations."
The team's work has been published under open-access terms on arXiv.org.