Table of Contents
Fetching ...

Scale-Aware Trident Networks for Object Detection

Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang

TL;DR

Scale variation remains a major challenge in object detection. The authors propose TridentNet, a multi-branch backbone with shared weights and different receptive fields to produce scale-specific feature maps guided by a scale-aware training scheme. They also introduce TridentNet Fast for fast inference without extra computation, achieving strong results. On COCO with ResNet-101, TridentNet achieves 48.4 AP in single-model settings and surpasses prior scale-handling methods, highlighting practical impact for real-time detection.

Abstract

Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we adopt a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost compared with the vanilla detector. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results of 48.4 mAP. Codes are available at https://git.io/fj5vR.

Scale-Aware Trident Networks for Object Detection

TL;DR

Scale variation remains a major challenge in object detection. The authors propose TridentNet, a multi-branch backbone with shared weights and different receptive fields to produce scale-specific feature maps guided by a scale-aware training scheme. They also introduce TridentNet Fast for fast inference without extra computation, achieving strong results. On COCO with ResNet-101, TridentNet achieves 48.4 AP in single-model settings and surpasses prior scale-handling methods, highlighting practical impact for real-time detection.

Abstract

Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we adopt a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost compared with the vanilla detector. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results of 48.4 mAP. Codes are available at https://git.io/fj5vR.

Paper Structure

This paper contains 25 sections, 1 equation, 4 figures, 8 tables.

Figures (4)

  • Figure 1: (a) Using multiple images of several scales as input, the image pyramid methods perform feature extraction and object detection independently for each scale. (b) The feature pyramid methods utilize the features from different layers of CNNs for different scales, which is computational friendly. This figure takes FPN fpn as an example. (c) Our proposed Trident Network generates scale-aware feature maps efficiently by trident blocks with different receptive fields.
  • Figure 2: Illustration of the proposed TridentNet. The multiple branches in trident blocks share the same parameters with different dilation rates to generate scale-specific feature maps. Objects of specified scales are sampled for each branch during training. The final proposals or detections from multiple branches will be combined using Non-maximum Suppression (NMS). Here we only show the backbone network of TridentNet. The RPN and Fast R-CNN heads are shared among branches and ignored for simplicity.
  • Figure 3: A trident block constructed from a bottleneck residual block.
  • Figure 4: Results on the COCO minival set using different number of trident blocks on ResNet-101.