Table of Contents
Fetching ...

Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

TL;DR

PERE tackles unreliable pseudo labels and instance-point-number inconsistency in cross-dataset 3D object detection by introducing Complementary Augmentation to replace or remove unreliable pseudo boxes, interpolation/extrapolation to generate additional proposals, and cross-domain RoI feature alignment via a redesigned triplet loss. The method integrates these components into a self-training loop, improving pseudo-label quality and RoI discrimination across domains. Experiments on NuScenes, Waymo, and KITTI show consistent improvements over state-of-the-art self-training methods and substantial reductions in gaps to oracle supervision, demonstrating practical impact for unsupervised domain adaptation in 3D detection.

Abstract

Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previous techniques mitigate this by reweighting these boxes as pseudo labels, but these boxes can still poison the training process. To resolve this problem, in this paper, we propose a novel pseudo label refinery framework. Specifically, in the selection process, to improve the reliability of pseudo boxes, we propose a complementary augmentation strategy. This strategy involves either removing all points within an unreliable box or replacing it with a high-confidence box. Moreover, the point numbers of instances in high-beam datasets are considerably higher than those in low-beam datasets, also degrading the quality of pseudo labels during the training process. We alleviate this issue by generating additional proposals and aligning RoI features across different domains. Experimental results demonstrate that our method effectively enhances the quality of pseudo labels and consistently surpasses the state-of-the-art methods on six autonomous driving benchmarks. Code will be available at https://github.com/Zhanwei-Z/PERE.

Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

TL;DR

PERE tackles unreliable pseudo labels and instance-point-number inconsistency in cross-dataset 3D object detection by introducing Complementary Augmentation to replace or remove unreliable pseudo boxes, interpolation/extrapolation to generate additional proposals, and cross-domain RoI feature alignment via a redesigned triplet loss. The method integrates these components into a self-training loop, improving pseudo-label quality and RoI discrimination across domains. Experiments on NuScenes, Waymo, and KITTI show consistent improvements over state-of-the-art self-training methods and substantial reductions in gaps to oracle supervision, demonstrating practical impact for unsupervised domain adaptation in 3D detection.

Abstract

Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previous techniques mitigate this by reweighting these boxes as pseudo labels, but these boxes can still poison the training process. To resolve this problem, in this paper, we propose a novel pseudo label refinery framework. Specifically, in the selection process, to improve the reliability of pseudo boxes, we propose a complementary augmentation strategy. This strategy involves either removing all points within an unreliable box or replacing it with a high-confidence box. Moreover, the point numbers of instances in high-beam datasets are considerably higher than those in low-beam datasets, also degrading the quality of pseudo labels during the training process. We alleviate this issue by generating additional proposals and aligning RoI features across different domains. Experimental results demonstrate that our method effectively enhances the quality of pseudo labels and consistently surpasses the state-of-the-art methods on six autonomous driving benchmarks. Code will be available at https://github.com/Zhanwei-Z/PERE.
Paper Structure (23 sections, 13 equations, 7 figures, 3 tables)

This paper contains 23 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Self-training methods generally consist of the selection and the training process. (a) In the selection process, setting the threshold whether high or low would lead to inevitable false negatives or false positives during the threshold interval. (b) In the training process, the point numbers of instances in high-beam datasets are markedly higher than those in low-beam datasets, which causes RoI feature confusion across different categories.
  • Figure 2: The overall framework of our PERE. (a) We pre-train an existing two-stage 3D detector in the source domain and then generate the basic pseudo labels in the target domain, followed by two iterative processes. (b) During the selection process, these labels are processed by Complementary Augmentation (Sec. \ref{['aug']}) to boost the reliability of pseudo boxes. (c) During the training process, we implement Additional Proposal Generation Based on Interpolation and Extrapolation (Sec. \ref{['inex']}), (d) and perform Cross-Domain RoI Feature Alignment (Sec. \ref{['tripletsec']}) to progressively address the issue of IPNI. After training $k$ epochs, we update the basic pseudo labels.
  • Figure 3: An example of how CA works. Here, the margin $\left[T_{neg},T_{pos} \right]$ is set as $\left[0.2,0.6\right]$. $u_{\nu}\le T_{neg}$, so box $\nu$ is discarded. $u_{\mu}\ge T_{pos}$, so box $\mu$ is cached in the database $B_h$. $T_{neg}< u_b<T_{pos}$, so box $b$ is performed by either BoxReplace or PointRemove according to Eq. (\ref{['fcx1']}).
  • Figure 4: We adopt bird-eye view (BEV) and omit other basic low-confidence proposals to present the interpolation and extrapolation operations more intuitively. (a) and (b) demonstrate that the extrapolated and the interpolated proposals exhibit the closest alignment with their corresponding instances, respectively.
  • Figure 5: In (a) and (b), where $d_1 = d_2$, we implement the intra-domain loss within the same domain, whereas the inter-domain loss is implemented across different domains, as depicted in (c) and (d), where $d_1 \neq d_2$.
  • ...and 2 more figures