NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le
TL;DR
The paper tackles the challenge of manually designing feature pyramids for multi-scale object detection by introducing NAS-FPN, a neural architecture search framework that learns cross-scale fusion patterns. It defines a modular, repeatable merging-cell search space and uses a PPO-based controller with a proxy task to discover scalable NAS-FPN architectures that can be stacked and adapted to different backbones. Empirical results show NAS-FPN achieves superior accuracy/latency tradeoffs across backbones, including 48.3 AP with AmoebaNet-D and strong mobile performance with NAS-FPNLite, often surpassing methods like Mask R-CNN with less computation. The work also demonstrates the potential for anytime detection via deep supervision and improves regularization with DropBlock, highlighting NAS-FPN as a versatile, scalable approach for scalable object detection.
Abstract
Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales. NAS-FPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. NAS-FPN improves mobile detection accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy with less computation time.
