Table of Contents
Fetching ...

Unbiased Teacher for Semi-Supervised Object Detection

Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

TL;DR

The paper tackles pseudo-labeling bias in semi-supervised object detection caused by class imbalance. It introduces Unbiased Teacher, a mutual Teacher-Student framework with EMA-based teacher refinement and a class-balancing (Focal) loss to produce and leverage pseudo-labels more reliably. Empirical results on COCO-standard, COCO-additional, and VOC show substantial gains over prior SS-OD methods, especially with very limited labeled data. The approach demonstrates that stable pseudo-labels and balanced supervision are key to unlocking strong SS-OD performance in real-world datasets.

Abstract

Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires more annotation effort. In this work, we revisit the Semi-Supervised Object Detection (SS-OD) and identify the pseudo-labeling bias issue in SS-OD. To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner. Together with a class-balance loss to downweight overly confident pseudo-labels, Unbiased Teacher consistently improved state-of-the-art methods by significant margins on COCO-standard, COCO-additional, and VOC datasets. Specifically, Unbiased Teacher achieves 6.8 absolute mAP improvements against state-of-the-art method when using 1% of labeled data on MS-COCO, achieves around 10 mAP improvements against the supervised baseline when using only 0.5, 1, 2% of labeled data on MS-COCO.

Unbiased Teacher for Semi-Supervised Object Detection

TL;DR

The paper tackles pseudo-labeling bias in semi-supervised object detection caused by class imbalance. It introduces Unbiased Teacher, a mutual Teacher-Student framework with EMA-based teacher refinement and a class-balancing (Focal) loss to produce and leverage pseudo-labels more reliably. Empirical results on COCO-standard, COCO-additional, and VOC show substantial gains over prior SS-OD methods, especially with very limited labeled data. The approach demonstrates that stable pseudo-labels and balanced supervision are key to unlocking strong SS-OD performance in real-world datasets.

Abstract

Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires more annotation effort. In this work, we revisit the Semi-Supervised Object Detection (SS-OD) and identify the pseudo-labeling bias issue in SS-OD. To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner. Together with a class-balance loss to downweight overly confident pseudo-labels, Unbiased Teacher consistently improved state-of-the-art methods by significant margins on COCO-standard, COCO-additional, and VOC datasets. Specifically, Unbiased Teacher achieves 6.8 absolute mAP improvements against state-of-the-art method when using 1% of labeled data on MS-COCO, achieves around 10 mAP improvements against the supervised baseline when using only 0.5, 1, 2% of labeled data on MS-COCO.

Paper Structure

This paper contains 20 sections, 4 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: (a) Illustration of semi-supervised object detection, where the model observes a set of labeled data and a set of unlabeled data in the training stage. (b) Our proposed model can efficiently leverage the unlabeled data and perform favorably against the existing semi-supervised object detection works, including CSD jeong2019consistency and STAC sohn2020simple.
  • Figure 2: Validation Losses of our model and the model trained with labeled data only. When the labeled data is insufficient (1$\%$ and 5$\%$), RPN and ROIhead classifiers suffer from overfitting, while RPN and ROIhead regression do not suffer from overfitting. Our model can significantly alleviates the overfitting issue in classifiers and also improves the validation box regression loss.
  • Figure 3: Overview of Unbiased Teacher. Unbiased Teacher consists of two stages. Burn-In: we first train the object detector using available labeled data. Teacher-Student Mutual Learning consists of two steps. Student Learning: the fixed teacher generates pseudo-labels to train the Student, while Teacher and Student are given weakly and strongly augmented inputs, respectively. Teacher Refinement: the knowledge that the Student learned is then transferred to the slowly progressing Teacher via exponential moving average (EMA) on network weights. When the detector is trained until converge in the Burn-In stage, we switch to the Teacher-Student Mutual Learning stage.
  • Figure 4: Pseudo-label improvement on (a) accuracy, (b) mIoU, and (c) number of bounding boxes in the case of COCO-standard 1% labeled data. We measure the (a) accuracy and (b) mIoU by comparing the ground-truth boxes and pseudo boxes. The Burn-In limit curves indicate the pseudo-boxes obtained from the model right after the Burn-In stage without further refinement (i.e., the model trained on labeled data only). GT curve on the number of boxes figure indicates the averaged number of bounding boxes in the GT labels, and we showed that there are around $7$ bounding boxes per image on average in MS-COCO. This result indicates our model can generate more accurate pseudo-labels after the Burn-In stage (i.e., 2k iterations).
  • Figure 5: Ablation study on the EMA and the Focal loss in the case of COCO-standard$1\%$ labeled data. (a) mAP of the models using the Focal loss or cross-entropy and applying the EMA or standard training. (b) Class empirical distribution (i.e., histogram) of pseudo-labels generated by each model and compute $\mathcal{KL}$-divergence between the ground-truth labels distribution and the pseudo-label distribution. Among these models, the model using the Focal loss and EMA training (i.e., green curve) achieves the best mAP with the most balanced pseudo-labels .
  • ...and 6 more figures