Collaborative Learning on the Edge
RockNet enables distributed machine learning across ultra-low-power devices, boosting accuracy and efficiency without cloud training.
Intelligence is finding its way into just about everything you can imagine these days, from self-driving vehicles to smart factories, implantable medical devices, and IoT sensor networks. The algorithms running on these devices are powered by machine learning, but as the available computing hardware on these platforms is typically highly constrained, developers cannot expect to have an H100 GPU at their disposal. Oftentimes, there is nothing more than a low-power microcontroller to work with, so efficiency is king.
Techniques like quantization and model pruning have gone a long way toward bringing practical models to tiny devices. But these approaches only speed up executing inferences, not model training. That means algorithms deployed to low-power platforms cannot be fine-tuned on the fly, which significantly impacts their ability to adapt to changing conditions. A new method spearheaded by engineers at Aachen University in Germany is seeking to change that. They have developed a system they call RockNet, which enables distributed learning on ultra-low-power devices.
Traditionally, machine learning models for low-power devices are trained in the cloud and later deployed for inference on edge hardware. While cloud training is powerful, it comes with drawbacks, including high latency, heavy network usage, and potential privacy risks when sensitive data must be transmitted off-device. On the other hand, performing training directly on-device is typically infeasible, as these microcontrollers lack the computational and memory resources needed for such workloads.
RockNet changes that by distributing the training process across many small devices, rather than relying on a single one. The team leveraged the fact that tinyML deployments often include dozens or even hundreds of networked devices that can communicate locally. Instead of one microcontroller struggling to train a model in isolation, RockNet allows all of them to collaborate in parallel. Each device performs a fraction of the total computation, then exchanges small amounts of data with its peers.
The system is composed of two main components: the ROCKET classifier and a custom wireless communication protocol called Mixer. ROCKET (Random Convolutional Kernel Transform) is a machine learning method that achieves state-of-the-art performance in time-series classification tasks, like detecting faults in machines or spotting malware on embedded systems. ROCKET works by convolving sensor data with thousands of randomly generated filters, then feeding the resulting features into a lightweight linear classifier. This structure yields high accuracy but would normally overwhelm the limited memory of an ultra-low-power device.
RockNet overcomes that limitation by splitting the ROCKET model into pieces that are distributed across multiple devices. Each microcontroller is responsible for computing a small subset of the model’s features and weights. When one device measures new sensor data, it shares that input with the others. Each participant then performs its portion of the computation locally and sends back compact intermediate results. These results are aggregated to produce the final model output.
Coordinating all this activity requires fast, reliable communication, which is where Mixer comes in. Mixer uses a combination of synchronous transmissions and network coding to enable all devices to broadcast and receive data simultaneously. It ensures that every message reaches every node with near-perfect reliability, all while maintaining tight timing synchronization across the network. This design allows RockNet to scale smoothly from a few devices to dozens, without choking on communication overhead.
In real-world tests using 20 low-power nRF52840 boards — each with just a 64 MHz ARM Cortex-M4 CPU and 256 kB of RAM — RockNet trained time-series classifiers from scratch with better accuracy than previous approaches. As the system scaled from one to 20 devices, it reduced per-device memory usage by up to 93%, latency by 89%, and energy consumption by 86%.
From smart factories that detect equipment failures in real time to wearable devices that continually refine their sensing models, RockNet could bring a new level of adaptability and accuracy to the tiniest edge devices.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.