Convolutional Neural Networks (CNN) are very commonly applied to image analysis tasks due to the accuracy and speed they provide when compared with alternate methods. Despite the success of CNNs, they are not effective when simultaneous recognition and localization of objects is needed. This results from the architecture of CNNs, in which input images are encoded into features through a series of scale-decreased layers. Accordingly, spatial information is discarded during down-sampling, and must be recovered by a decoder network, which is a very difficult proposition.
In light of this issue, researchers at Google AI have developed a new meta architecture that they call a scale-permuted model. With this model, the scales of intermediate feature maps can increase or decrease as needed, so spatial information is retained as network depth increases. Another feature of the model is that connections can be made between feature maps at various scales to support multi-scale feature fusion. A Network Architecture Search (NAS) is performed, with a search space design that includes these features, to discover an appropriate scale-permuted model.
To test this new method, the researchers started with the well known ResNet-50 model as a seed for the NAS search. The scale-permuted model that was learned, named SpineNet-49, outperformed ResNet-50-FPN by ~3% average precision in object detection. It managed this improvement while also requiring 10-20% fewer floating point operations per second.
Before we know if scale-permuted models, such as SpineNet, will be the next big thing in machine learning, or just another great idea that did not quite deliver as hoped, the machine learning community will need to collectively put them through their paces. Google AI has open sourced the SpineNet code and made it available at GitHub for TensorFlow 1 and TensorFlow 2 for anyone interested in giving it a try.