RNNPool Reduces Computer Vision RAM Usage 10X on Edge Devices Without Sacrificing Accuracy

Drop-in CNN block replacement means models with fewer layers/smaller intermediate representations, dramatically reducing memory and compute.

RNNPool consists of two recurrent neural networks (RNNs) that sweep across each patch of an activation map horizontally and vertically to summarize each patch into a single vector. (📷: Microsoft Research)

As machine learning descends from the realm of high-end GPUs and the cloud to mobile devices and finally to the Edge, with compute and memory resources shrinking accordingly on these ever-smaller platforms, sometimes smaller-scale versions of the same stalwart solutions that have worked in situations where resource constraints were less constraining aren't sufficient. While increasingly sophisticated convolutional neural networks (CNNs) and AI-accelerated hardware have resulted in countless breakthroughs in the field of computer vision in recent years, as these technologies approach the edge, limited RAM and computing power can make conventional methods impractical, or even impossible. One area where memory usage can be particularly high in CNNs is the activation map output layer — a 3D tensor that increases in size with the number of features, and consumes gobs of RAM even when precision is reduced. Researchers at Microsoft have introduced RNNPool, which replaces the pooling layers of a CNN with recurrent neural networks (RNNs) that consume much less RAM, without sacrificing accuracy.

RNNPool's RNNs sweep activation maps in several passes, downsampling without degrading accuracy, producing an equivalent output to traditional CNN pooling operators, but with far lower peak RAM usage. Since RNNPool obviates the need for several layers in the CNN, compute is also reduced. In testing, Microsoft researchers saw an 8-10X reduction in memory usage for vision tasks, as well as 2-3X computational reduction, all without sacrificing accuracy. Compared to MobileNetV2, an RNNPool-based model called RNNPool-Face-M4 actually performed more accurately while using only one fifth of the RAM.

RNNPool is available on GitHub, including PyTorch examples, as part of Microsoft's EdgeML library. "RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference" will be presented at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020).

Latest Articles