Computer Vision at the Edge? Just Zip It!
ZIP-CNN simplifies deploying CNNs on microcontrollers by estimating costs and applying reduction techniques to meet hardware constraints.
Cutting edge applications in artificial intelligence (AI) are typically built and executed in large data centers packed with specialized hardware like graphics processing units and tensor processing units. But as these applications become more integrated into our everyday lives, it is becoming abundantly clear that this existing paradigm is not suitable in all cases. Real-time applications, for example, cannot respond quickly enough due to the latency introduced when sending data over networks. Furthermore — especially in the case of portable and wearable devices — the data to be processed may be sensitive, so sending it over the internet to a shared cloud computing system may be unacceptable.
Advances in edge AI and tinyML — technologies that enable AI algorithms to run on less powerful devices like microcontrollers — have gone a long way toward addressing these concerns. But for all the progress that has been made, there is still a lot of work to be done. Numerous predictive algorithms can now run on even the tiniest of hardware platforms, but when it comes to more resource-intensive applications, like computer vision, these platforms often cannot meet their demands.
Convolutional neural networks (CNNs), in particular, have been instrumental in pushing the field of computer vision forward. But CNNs have high inference costs, so getting them to run effectively on a microcontroller is quite challenging. In the near future, this job may not be nearly as difficult as it is today, thanks to the work of a team led by researchers at Sorbonne University in France. They have created what they call ZIP-CNN, which is a design space exploration tool that seeks to make deploying CNNs on microcontrollers much more simple.
The goal of ZIP-CNN is to help embedded system designers determine if a specific CNN can be used on their hardware or if changes are needed to make it fit within the hardware’s constraints, such as memory, processing power, and energy usage. It starts by analyzing the cost of running a given CNN model on an embedded system in terms of key factors like latency, energy consumption, and memory usage. This analysis is done without physically implementing the CNN on the hardware, which saves time and resources. Based on this estimation, ZIP-CNN can predict whether the CNN, in its current form, can meet the requirements of a specific application.
In most cases, the original CNN is too large or demanding to fit the hardware constraints. Here, ZIP-CNN suggests using reduction techniques like pruning, quantization, or knowledge distillation to shrink the model. After these reductions, the model may need to be retrained to ensure it still meets the accuracy requirements of the application. If the reduced model passes the tests, it is then implemented on the hardware, followed by experimental validation.
If the reduction technique applied does not meet the constraints, ZIP-CNN allows for iterative adjustments. Different reduction techniques or combinations of techniques will be tested to find the best configuration that works. If these adjustments still do not work, the designer may consider changing the CNN architecture to one that is inherently less resource-intensive or switching to a different hardware platform that can better support the CNN.
ZIP-CNN was tested on three different microcontrollers, and with three CNN topologies. After adjusting the models for execution on these platforms, they were found to have low error rates and minimal latency. ZIP-CNN could prove to be an important tool for developers working on computer vision applications at the edge.