Table of Contents
Fetching ...

Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection

Moussa Kassem Sbeyti, Nadja Klein, Azarm Nowzad, Fikret Sivrikaya, Sahin Albayrak

TL;DR

Real-world semi-supervised object detection struggles due to class imbalance, label noise, and missing pseudo-labels. The authors introduce four data-centric building blocks—RCC, RCF, GLC, and PLS—to improve label quality and class balance within a teacher-student SSOD framework, demonstrated with delta_s gating and lightweight integration. Experiments on KITTI and BDD100K show that pseudo-label quality matters more than quantity, and that combining RCC/RCF with GLC and PLS yields substantial gains (up to 6% in SSOD and up to 21% when combined), highlighting practical benefits for autonomous-driving scenarios. The work provides a practical, model- and framework-agnostic toolkit to robustify SSOD in real-world conditions and outlines directions for extending these ideas to other domains and detectors.

Abstract

Semi-supervised object detection (SSOD) based on pseudo-labeling significantly reduces dependence on large labeled datasets by effectively leveraging both labeled and unlabeled data. However, real-world applications of SSOD often face critical challenges, including class imbalance, label noise, and labeling errors. We present an in-depth analysis of SSOD under real-world conditions, uncovering causes of suboptimal pseudo-labeling and key trade-offs between label quality and quantity. Based on our findings, we propose four building blocks that can be seamlessly integrated into an SSOD framework. Rare Class Collage (RCC): a data augmentation method that enhances the representation of rare classes by creating collages of rare objects. Rare Class Focus (RCF): a stratified batch sampling strategy that ensures a more balanced representation of all classes during training. Ground Truth Label Correction (GLC): a label refinement method that identifies and corrects false, missing, and noisy ground truth labels by leveraging the consistency of teacher model predictions. Pseudo-Label Selection (PLS): a selection method for removing low-quality pseudo-labeled images, guided by a novel metric estimating the missing detection rate while accounting for class rarity. We validate our methods through comprehensive experiments on autonomous driving datasets, resulting in up to 6% increase in SSOD performance. Overall, our investigation and novel, data-centric, and broadly applicable building blocks enable robust and effective SSOD in complex, real-world scenarios. Code is available at https://mos-ks.github.io/publications.

Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection

TL;DR

Real-world semi-supervised object detection struggles due to class imbalance, label noise, and missing pseudo-labels. The authors introduce four data-centric building blocks—RCC, RCF, GLC, and PLS—to improve label quality and class balance within a teacher-student SSOD framework, demonstrated with delta_s gating and lightweight integration. Experiments on KITTI and BDD100K show that pseudo-label quality matters more than quantity, and that combining RCC/RCF with GLC and PLS yields substantial gains (up to 6% in SSOD and up to 21% when combined), highlighting practical benefits for autonomous-driving scenarios. The work provides a practical, model- and framework-agnostic toolkit to robustify SSOD in real-world conditions and outlines directions for extending these ideas to other domains and detectors.

Abstract

Semi-supervised object detection (SSOD) based on pseudo-labeling significantly reduces dependence on large labeled datasets by effectively leveraging both labeled and unlabeled data. However, real-world applications of SSOD often face critical challenges, including class imbalance, label noise, and labeling errors. We present an in-depth analysis of SSOD under real-world conditions, uncovering causes of suboptimal pseudo-labeling and key trade-offs between label quality and quantity. Based on our findings, we propose four building blocks that can be seamlessly integrated into an SSOD framework. Rare Class Collage (RCC): a data augmentation method that enhances the representation of rare classes by creating collages of rare objects. Rare Class Focus (RCF): a stratified batch sampling strategy that ensures a more balanced representation of all classes during training. Ground Truth Label Correction (GLC): a label refinement method that identifies and corrects false, missing, and noisy ground truth labels by leveraging the consistency of teacher model predictions. Pseudo-Label Selection (PLS): a selection method for removing low-quality pseudo-labeled images, guided by a novel metric estimating the missing detection rate while accounting for class rarity. We validate our methods through comprehensive experiments on autonomous driving datasets, resulting in up to 6% increase in SSOD performance. Overall, our investigation and novel, data-centric, and broadly applicable building blocks enable robust and effective SSOD in complex, real-world scenarios. Code is available at https://mos-ks.github.io/publications.

Paper Structure

This paper contains 29 sections, 5 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Our building blocks integrated into an exemplary SSOD framework. The teacher model $M_{\hbox{T}}$, trained on labeled data, generates pseudo-labels for unlabeled data, which are then filtered by a confidence threshold $\delta_s$. To address class imbalance, Rare Class Collage (RCC) (\ref{['sec:subrcc']}) crops instances of rare classes and combines them into collages, increasing their representation. Rare Class Focus (RCF) (\ref{['sec:subrcf']}) ensures each training batch contains common and rare classes, with augmented rare class images to boost their impact. Ground Truth Label Correction (GLC) (\ref{['sec:subglc']}) corrects false, missing, and noisy labels by utilizing teacher prediction consistency across augmentations. Pseudo-Label Selection (PLS) (\ref{['sec:subpls']}) removes pseudo-labeled images with many missing detections, estimated using our metric $D_i(\delta_s, \beta)$, which incorporates detection confidence and class rarity. Together, our methods enhance the ability of the student model $M_{\hbox{S}} (\delta_s)$ to learn effectively from both labeled and pseudo-labeled data, minimizing the propagation of errors from the teacher model.
  • Figure 2: KITTI (left), BDD (right). Impact of the relationship between the confidence score threshold ($\delta_s$) and the proportion of labeled data on performance. A higher proportion of labeled data allows an effective increase in $\delta_s$. A misconfigured $\delta_s$ relative to the available labeled data results in a student ($M_{\hbox{S}}$) that underperforms its teacher ($M_{\hbox{T}}$).
  • Figure 3: KITTI (left, 10% labeled), BDD (right, 1% labeled). Class frequency $f_k$ for each class in $\mathcal{D}_{\text{labeled}}$.
  • Figure 4: KITTI (left), BDD (right). Example collages with scaling factors $\gamma_{r,\min} = 0.25$ and $\gamma_{r,\max} = 0.75$.
  • Figure 5: KITTI (top), BDD (bottom). Batch structuring via RCF with images categorized as common (left) predominantly containing cars and images categorized as rare (right) including the rare classes "tram", "truck", "pedestrian" and "person$\_$sitting".
  • ...and 11 more figures