Table of Contents
Fetching ...

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian, Chunhua Shen, Hao Chen, Tong He

TL;DR

FCOS reframes object detection as a per-pixel dense prediction task by eliminating anchor boxes and proposals. It uses multi-level FPN-based predictions with a center-ness branch to suppress low-quality detections, achieving competitive or superior performance to anchor-based one-stage detectors while drastically reducing design complexity and hyper-parameters. Key contributions include a simple, fully convolutional framework, a learned centerness score, and demonstrated effectiveness as a drop-in replacement for RPNs in two-stage detectors. The results show strong COCO performance and practical benefits in training efficiency and generalization, suggesting anchor-free detection as a viable and effective alternative for instance-level vision tasks.

Abstract

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:Code is available at: https://tinyurl.com/FCOSv1

FCOS: Fully Convolutional One-Stage Object Detection

TL;DR

FCOS reframes object detection as a per-pixel dense prediction task by eliminating anchor boxes and proposals. It uses multi-level FPN-based predictions with a center-ness branch to suppress low-quality detections, achieving competitive or superior performance to anchor-based one-stage detectors while drastically reducing design complexity and hyper-parameters. Key contributions include a simple, fully convolutional framework, a learned centerness score, and demonstrated effectiveness as a drop-in replacement for RPNs in two-stage detectors. The results show strong COCO performance and practical benefits in training efficiency and generalization, suggesting anchor-free detection as a viable and effective alternative for instance-level vision tasks.

Abstract

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:Code is available at: https://tinyurl.com/FCOSv1

Paper Structure

This paper contains 28 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: As shown in the left image, FCOS works by predicting a 4D vector $(l, t, r, b)$ encoding the location of a bounding box at each foreground pixel (supervised by ground-truth bounding box information during training). The right plot shows that when a location residing in multiple bounding boxes, it can be ambiguous in terms of which bounding box this location should regress.
  • Figure 2: The network architecture of FCOS, where C3, C4, and C5 denote the feature maps of the backbone network and P3 to P7 are the feature levels used for the final prediction. $H \times W$ is the height and width of feature maps. '/$s$' ($s=8, 16, ..., 128$) is the down-sampling ratio of the feature maps at the level to the input image. As an example, all the numbers are computed with an $800\times 1024$ input.
  • Figure 3: Center-ness. Red, blue, and other colors denote 1, 0 and the values between them, respectively. Center-ness is computed by Eq. (\ref{['eq:centerness']}) and decays from 1 to 0 as the location deviates from the center of the object. When testing, the center-ness predicted by the network is multiplied with the classification score thus can down-weight the low-quality bounding boxes predicted by a location far from the center of an object.
  • Figure 4: Class-agnostic precision-recall curves at IOU $= 0.50$.
  • Figure 5: Class-agnostic precision-recall curves at IOU $= 0.75$.
  • ...and 3 more figures