Table of Contents
Fetching ...

Rectify the Regression Bias in Long-Tailed Object Detection

Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu

TL;DR

This paper identifies regression bias as a critical but overlooked factor in long-tailed object detection, where class-specific RCNN regression heads hinder rare-class localization. It shows that a class-agnostic regression head in the proposal stage is more balanced and proposes three remedies to harmonize regression across classes, selecting an extra class-agnostic branch (CAB) as the main solution. CAB, together with clustering or merging alternatives, yields consistent improvements on LVIS and transfers to COCO-LT and segmentation tasks, achieving state-of-the-art results and robust generalization across metrics. The work provides both theoretical and empirical support for rectifying regression bias, offering practical gains for rare-class accuracy and overall detection quality.

Abstract

Long-tailed object detection faces great challenges because of its extremely imbalanced class distribution. Recent methods mainly focus on the classification bias and its loss function design, while ignoring the subtle influence of the regression branch. This paper shows that the regression bias exists and does adversely and seriously impact the detection accuracy. While existing methods fail to handle the regression bias, the class-specific regression head for rare classes is hypothesized to be the main cause of it in this paper. As a result, three kinds of viable solutions to cater for the rare categories are proposed, including adding a class-agnostic branch, clustering heads and merging heads. The proposed methods brings in consistent and significant improvements over existing long-tailed detection methods, especially in rare and common classes. The proposed method achieves state-of-the-art performance in the large vocabulary LVIS dataset with different backbones and architectures. It generalizes well to more difficult evaluation metrics, relatively balanced datasets, and the mask branch. This is the first attempt to reveal and explore rectifying of the regression bias in long-tailed object detection.

Rectify the Regression Bias in Long-Tailed Object Detection

TL;DR

This paper identifies regression bias as a critical but overlooked factor in long-tailed object detection, where class-specific RCNN regression heads hinder rare-class localization. It shows that a class-agnostic regression head in the proposal stage is more balanced and proposes three remedies to harmonize regression across classes, selecting an extra class-agnostic branch (CAB) as the main solution. CAB, together with clustering or merging alternatives, yields consistent improvements on LVIS and transfers to COCO-LT and segmentation tasks, achieving state-of-the-art results and robust generalization across metrics. The work provides both theoretical and empirical support for rectifying regression bias, offering practical gains for rare-class accuracy and overall detection quality.

Abstract

Long-tailed object detection faces great challenges because of its extremely imbalanced class distribution. Recent methods mainly focus on the classification bias and its loss function design, while ignoring the subtle influence of the regression branch. This paper shows that the regression bias exists and does adversely and seriously impact the detection accuracy. While existing methods fail to handle the regression bias, the class-specific regression head for rare classes is hypothesized to be the main cause of it in this paper. As a result, three kinds of viable solutions to cater for the rare categories are proposed, including adding a class-agnostic branch, clustering heads and merging heads. The proposed methods brings in consistent and significant improvements over existing long-tailed detection methods, especially in rare and common classes. The proposed method achieves state-of-the-art performance in the large vocabulary LVIS dataset with different backbones and architectures. It generalizes well to more difficult evaluation metrics, relatively balanced datasets, and the mask branch. This is the first attempt to reveal and explore rectifying of the regression bias in long-tailed object detection.
Paper Structure (12 sections, 8 equations, 5 figures, 8 tables)

This paper contains 12 sections, 8 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: \ref{['fig:motivation:rcnn']} shows the RCNN regression loss of frequent, common and rare categories. \ref{['fig:motivation:rpn']} shows the RPN regression loss. \ref{['fig:motivation:scale']} shows distribution of per class mean object scales in LVIS1.0. 'delta' in \ref{['fig:motivation:scale']} is the negative of difference between train and validation set sizes for different classes.
  • Figure 2: Illustration of the regular two-stage detection pipeline and the proposed regression methods. Previous methods (the left figure) generally focus on the final classification branch (the yellow arrow), while we focus on rectifying the regression bias (the red arrow). The right part shows our three regression methods, including adding an extra branch $W_0$, clustering regression heads (e.g., from $W_1,\dots,W_C$ to $W_1', W_2'$) and merging (e.g., merging rare categories into one regression head $W_{rare}$, cf. Table \ref{['tab:main-method']}). In our main experiments, we choose 'adding an extra branch' for its simplicity. This figure needs to be viewed in color.
  • Figure 3: Visualizations of detection results before (in the left of each group) and after (in the right) using our CAB. We adopted RFS LVIS as the baseline in LVIS1.0 and combine it with our CAB regression method. In comparison, the proposed method is good at detecting missing objects, filtering duplicated objects away, as well as rectifying bounding box predictions. This figure needs to be viewed in color.
  • Figure 4: The loss distribution shift before and after combining with our CAB. Here we use EQLv2 and CE as baselines.
  • Figure 5: The AP improvement of combining our CAB with the baseline RFS LVIS in LVIS1.0. We enumerate AP from AP50 to AP95, following the common practice adopted in MS-COCO.