Table of Contents
Fetching ...

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, Stan Z. Li

TL;DR

This work argues that the performance gap between anchor-based and anchor-free object detectors mainly arises from how positive and negative training samples are defined. It introduces Adaptive Training Sample Selection (ATSS), which adaptively selects positives per ground-truth using center-based candidate sampling across pyramid levels and IoU statistics, achieving improvements for both anchor-based and anchor-free detectors. Extensive experiments on MS COCO show that ATSS narrows the gap between families, reduces reliance on multiple anchors per location, and yields state-of-the-art results without additional overhead. The findings provide a practical, robust approach to sample selection that enhances detection accuracy across architectures and backbones.

Abstract

Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to $50.7\%$ AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

TL;DR

This work argues that the performance gap between anchor-based and anchor-free object detectors mainly arises from how positive and negative training samples are defined. It introduces Adaptive Training Sample Selection (ATSS), which adaptively selects positives per ground-truth using center-based candidate sampling across pyramid levels and IoU statistics, achieving improvements for both anchor-based and anchor-free detectors. Extensive experiments on MS COCO show that ATSS narrows the gap between families, reduces reliance on multiple anchors per location, and yields state-of-the-art results without additional overhead. The findings provide a practical, robust approach to sample selection that enhances detection accuracy across architectures and backbones.

Abstract

Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS

Paper Structure

This paper contains 15 sections, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Definition of positives (1) and negatives (0). Blue box, red box and red point are ground-truth, anchor box and anchor point. (a) RetinaNet uses IoU to select positives (1) in spatial and scale dimension simultaneously. (b) FCOS first finds candidate positives (?) in spatial dimension, then selects final positives (1) in scale dimension.
  • Figure 2: (a) Blue point and box are the center and bound of object, red point and box are the center and bound of anchor. (b) RetinaNet regresses from anchor box with four offsets. (c) FCOS regresses from anchor point with four distances.
  • Figure 3: Illustration of ATSS. Each level has one candidate with its IoU. (a) A ground-truth with a high $m_g$ and a high $v_g$. (b) A ground-truth with a low $m_g$ and a low $v_g$.