Table of Contents
Fetching ...

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

TL;DR

This work introduces a novel decoupled pseudo-labeling (DPL) approach for SSM30D, and presents a Depth Gradient Projection module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and re-moving conflicting gradients.

Abstract

We delve into pseudo-labeling for semi-supervised monocular 3D object detection (SSM3OD) and discover two primary issues: a misalignment between the prediction quality of 3D and 2D attributes and the tendency of depth supervision derived from pseudo-labels to be noisy, leading to significant optimization conflicts with other reliable forms of supervision. We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels by separately processing 2D and 3D attributes. This module incorporates a unique homography-based method for identifying dependable pseudo-labels in BEV space, specifically for 3D attributes. Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients. This dual decoupling strategy-at both the pseudo-label generation and gradient levels-significantly improves the utilization of pseudo-labels in SSM3OD. Our comprehensive experiments on the KITTI benchmark demonstrate the superiority of our method over existing approaches.

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

TL;DR

This work introduces a novel decoupled pseudo-labeling (DPL) approach for SSM30D, and presents a Depth Gradient Projection module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and re-moving conflicting gradients.

Abstract

We delve into pseudo-labeling for semi-supervised monocular 3D object detection (SSM3OD) and discover two primary issues: a misalignment between the prediction quality of 3D and 2D attributes and the tendency of depth supervision derived from pseudo-labels to be noisy, leading to significant optimization conflicts with other reliable forms of supervision. We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels by separately processing 2D and 3D attributes. This module incorporates a unique homography-based method for identifying dependable pseudo-labels in BEV space, specifically for 3D attributes. Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients. This dual decoupling strategy-at both the pseudo-label generation and gradient levels-significantly improves the utilization of pseudo-labels in SSM3OD. Our comprehensive experiments on the KITTI benchmark demonstrate the superiority of our method over existing approaches.
Paper Structure (16 sections, 11 equations, 8 figures, 11 tables)

This paper contains 16 sections, 11 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: (a) Visualization of Pseudo-Labels and Ground Truth in Image Plane and Bird's Eye View (BEV) Plane. Red: Ground Truth. Yellow: Detected Bounding Boxes. (b) Statistical analysis of classification scores and depth errors (PCC: Pearson Correlation Coefficient). (c) The proportion of different types of gradient conflicts occurring. $g_{sd}$: Gradient of ground truth depth loss. $g_{ud}$: Gradient of pseudo-label depth loss. $g_o$: Gradient of other attribute supervision loss. Gradient conflicts between $g_i$, $g_j$ when $\cos(g_i,g_j)<0$.
  • Figure 2: The overview of the Decoupled Pseudo-Labeling (DPL) method for SSM3OD. We conduct semi-supervised learning based on the teacher-student framework after the supervised training stage. DPL consists of Decoupled Pseudo-label Generation (DPG) module and Depth Gradient Projection (DGP) module. DPG decouple the 2D and 3D attribute and generate pseudo-labels independently, with a Homography-based Label Mining (HLM) algorithm designed to generate pseudo-labels 3D attributes by harnessing the homography transformation. DGP module utilizes a gradient projection operation to mitigate the potential negative impact of noisy depth supervision.
  • Figure 3: Illustration of HPM module. The homography transformation describes the coordinate mapping between the image plane and the BEV plane. Starting with the initial pseudo-labels, we iteratively estimate the homography matrix $H_{i}$ and search the reliable pseudo-labels in BEV space. Green Box: The selected Pseudo-Labels. Cyan Box: Model Prediction. Purple Box: Ground Truth.
  • Figure 4: Visual comparison of pseudo-labels generated among different pseudo-label generation strategies. (a) Pseudo labels generated with classification confidence threshold 0.6. (b) Pseudo labels generated by initial depth prediction uncertainty filtering in HPM. (c) Pseudo labels after HPM algorithm. Red Box: Ground truth. Yellow Box: Pseudo-Labels. Cyan dashed circles: The confident yet depth-deviated pseudo-labels. Cyan arrows: The pseudo-labels discovered through homography-based mining.
  • Figure 5: Left: The correlation between the gradient similarity $cos(g_{ud},g_{p})$ and the depth error. Right: The average depth error of the pseudo-labels obtained in different ways.
  • ...and 3 more figures