Object Detection Networks on Convolutional Feature Maps
Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun
TL;DR
This work introduces Networks on Convolutional feature maps (NoCs), a class of region-wise classifiers built on RoI-pooled convolutional features, and demonstrates that a deep, convolutional NoC is essential for high object-detection accuracy. Through extensive ablations on VOC07 with ZF and VGG backbones, it shows that deeper NoCs and conv-based classifiers outperform traditional MLPs and that scale-aware maxout further improves performance. Combining NoCs with Faster R-CNN architectures, including ResNet and GoogLeNet backbones, yields substantial gains on MS COCO and VOC benchmarks, arguing that classifier design on RoI-pooled features is as important as feature extraction depth. The findings advocate integrating deep, convolutional NoCs into modern detectors to achieve top performance on challenging datasets.
Abstract
Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them "Networks on Convolutional feature maps" (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.
