The world of artificial intelligence (AI) is shrinking. Well, not like that. The field is rapidly expanding, of course, but with the advent of tiny AI accelerators — miniature chips designed to squeeze the power of AI into the tiniest of devices — the hardware is rapidly shrinking. These accelerators are changing what is possible in the landscape of on-body AI, bringing intelligence directly to wearables and even implantables.
By bringing the AI algorithms to the point of data collection, data does not have to be transmitted to the cloud for processing. This has a number of important implications. First, sensitive information does not need to leave the device, greatly enhancing privacy. Moreover, inference speeds can be increased by avoiding the latency introduced by communicating with remote systems. Eliminating the need for a network connection also reduces power consumption and enables operation in remote locations.
A research group at Nokia Bell Labs has been tracking this trend towards miniaturization and realized that as costs continue to drop, it will become increasingly likely that individuals will have a network of AI accelerators distributed around their bodies. This could provide substantial processing horsepower for AI workloads, however, at present, popular AI development frameworks do not offer a lot of support for working with the strengths of these accelerators. Because of this, unnecessary steps, like heavy compression of models, can be taken, which negatively impacts the accuracy of the resulting models. Furthermore, each accelerator operates in isolation, so jobs, and subtasks, cannot be distributed to the most appropriate available hardware.
To make the most of an on-body network of AI accelerators, the team introduced a system that they call Synergy. This tool abstracts the specific hardware that is available into what they call a virtual computing space. Through this virtual computing space, AI applications are given a unified, virtualized view of all available resources. In this way, developers can focus on building solutions rather than dealing with the multitude of hardware architectures that might be present in any given on-body accelerator network.
Using Synergy, a developer of a tool could simply specify that they want to execute a specific type of model — like a keyword spotting model, for instance — and indicate any hardware that is needed, like a microphone or speaker. The runtime module, which tracks available resources and their utilization, will then identify appropriate hardware and distribute execution of the model across all available accelerators. By distributing model execution, where possible, inference times can be reduced through parallelism. This feature also allows for the execution of larger models than would otherwise be possible, decreasing reliance on model compression and other tactics that can reduce accuracy.
The researchers evaluated Synergy using a pair of AI accelerators developed by Analog Devices, the MAX78000 and the MAX78002. Eight different AI models (ConvNet5, KWS, SimpleNet, ResSimpleNet, WideNet, UNet, EfficientNetV2, and MobileNetV2) were executed via Synergy during the tests, and the results were compared with seven baselines that included state-of-the-art model partitioning techniques. It was discovered that Synergy consistently outperformed the baselines in a significant way — an average increase in throughput of eight-fold was observed.
Synergy may be a solution that has arrived before it is truly needed, but with the demonstrated effectiveness of the approach, it could become important in the years to come.