Table of Contents
Fetching ...

Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection

Taehun Kong, Tae-Kyun Kim

TL;DR

This work proposes a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels, and introduces a soft supervision strategy that can learn robustly under pseudo-label noises.

Abstract

Semi-supervised 3D object detection (SS3DOD) aims to reduce costly 3D annotations utilizing unlabeled data. Recent studies adopt pseudo-label-based teacher-student frameworks and demonstrate impressive performance. The main challenge of these frameworks is in selecting high-quality pseudo-labels from the teacher's predictions. Most previous methods, however, select pseudo-labels by comparing confidence scores over thresholds manually set. The latest works tackle the challenge either by dynamic thresholding or refining the quality of pseudo-labels. Such methods still overlook contextual information e.g. object distances, classes, and learning states, and inadequately assess the pseudo-label quality using partial information available from the networks. In this work, we propose a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels. Our approach introduces two networks at the teacher output level. These networks reliably assess the quality of pseudo-labels by the score fusion and determine context-adaptive thresholds, which are supervised by the alignment of pseudo-labels over GT bounding boxes. Additionally, we introduce a soft supervision strategy that can learn robustly under pseudo-label noises. This helps the student network prioritize cleaner labels over noisy ones in semi-supervised learning. Extensive experiments on the KITTI and Waymo datasets demonstrate the effectiveness of our method. The proposed method selects high-precision pseudo-labels while maintaining a wider coverage of contexts and a higher recall rate, significantly improving relevant SS3DOD methods.

Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection

TL;DR

This work proposes a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels, and introduces a soft supervision strategy that can learn robustly under pseudo-label noises.

Abstract

Semi-supervised 3D object detection (SS3DOD) aims to reduce costly 3D annotations utilizing unlabeled data. Recent studies adopt pseudo-label-based teacher-student frameworks and demonstrate impressive performance. The main challenge of these frameworks is in selecting high-quality pseudo-labels from the teacher's predictions. Most previous methods, however, select pseudo-labels by comparing confidence scores over thresholds manually set. The latest works tackle the challenge either by dynamic thresholding or refining the quality of pseudo-labels. Such methods still overlook contextual information e.g. object distances, classes, and learning states, and inadequately assess the pseudo-label quality using partial information available from the networks. In this work, we propose a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels. Our approach introduces two networks at the teacher output level. These networks reliably assess the quality of pseudo-labels by the score fusion and determine context-adaptive thresholds, which are supervised by the alignment of pseudo-labels over GT bounding boxes. Additionally, we introduce a soft supervision strategy that can learn robustly under pseudo-label noises. This helps the student network prioritize cleaner labels over noisy ones in semi-supervised learning. Extensive experiments on the KITTI and Waymo datasets demonstrate the effectiveness of our method. The proposed method selects high-precision pseudo-labels while maintaining a wider coverage of contexts and a higher recall rate, significantly improving relevant SS3DOD methods.

Paper Structure

This paper contains 22 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Overview of the proposed framework compared to the previous pseudo-labeling method in the semi-supervised framework. (a) illustrates the previous semi-supervised framework, where thresholds are determined manually or handcrafted, and filtering is applied based on those thresholds. (b) Shows the proposed framework, which includes the Pseudo-label Selection Module, which learns to select high-quality pseudo-labels within the SSL framework while ensuring robust training against pseudo-label noise through Soft Supervision.
  • Figure 2: (a), (b), and (c) show that classification confidence and objectness have different distributions depending on the context. (b) and (c) illustrate the distributions specifically for foreground objects. (d) compares previous pseudo-labeling methods in three aspects: the approach for determining score thresholds, the contexts considered, and the metrics used for evaluating pseudo-label quality. Auxiliary scores (Aux. score) refer to additional IoU predictions or objectness from different views.
  • Figure 3: Overview of the proposed framework, consisting of two main components: the Pseudo-label Selection Module (PSM), which selects pseudo-labels using the detector’s outputs and contexts, and Soft Supervision, which enhances robustness to pseudo-label noise. The PSM includes two neural networks, $\mathcal{Q}$ and $\mathcal{T}$, that predict pseudo-label quality and context-aware thresholds.
  • Figure 4: The correlation between GT-IoU and each score for KITTI 1% split. (a) Classification confidence, (b) Objectness, (c) IoU consistency hssda, and (d) the output score of PQE.
  • Figure 5: CTE thresholds by classes and distances.
  • ...and 7 more figures