Table of Contents
Fetching ...

DALI: Domain Adaptive LiDAR Object Detection via Distribution-level and Instance-level Pseudo Label Denoising

Xiaohu Lu, Hayder Radha

TL;DR

This work tackles the problem of domain shift in LiDAR-based 3D object detection by addressing noise in pseudo labels used in unsupervised domain adaptation. It introduces DALI, a two-pronged approach combining Post-training Size Normalization (PTSN) to correct distribution-level size bias via an optimal scale $\hat{s}$, and Pseudo Point Clouds Generation (PPCG) with ray-constrained and constraint-free variants to reduce instance-level misalignment. Across KITTI, Waymo, and nuScenes, and with backbones like SECOND-IoU and PV-RCNN, DALI achieves state-of-the-art performance on multiple cross-domain tasks, demonstrating strong cross-domain and within-domain robustness. The method offers a practical, geometry-driven augmentation to refine pseudo labels without extra target annotations, with future work extending to non-rigid objects and adverse weather conditions.

Abstract

Object detection using LiDAR point clouds relies on a large amount of human-annotated samples when training the underlying detectors' deep neural networks. However, generating 3D bounding box annotation for a large-scale dataset could be costly and time-consuming. Alternatively, unsupervised domain adaptation (UDA) enables a given object detector to operate on a novel new data, with unlabeled training dataset, by transferring the knowledge learned from training labeled \textit{source domain} data to the new unlabeled \textit{target domain}. Pseudo label strategies, which involve training the 3D object detector using target-domain predicted bounding boxes from a pre-trained model, are commonly used in UDA. However, these pseudo labels often introduce noise, impacting performance. In this paper, we introduce the Domain Adaptive LIdar (DALI) object detection framework to address noise at both distribution and instance levels. Firstly, a post-training size normalization (PTSN) strategy is developed to mitigate bias in pseudo label size distribution by identifying an unbiased scale after network training. To address instance-level noise between pseudo labels and corresponding point clouds, two pseudo point clouds generation (PPCG) strategies, ray-constrained and constraint-free, are developed to generate pseudo point clouds for each instance, ensuring the consistency between pseudo labels and pseudo points during training. We demonstrate the effectiveness of our method on the publicly available and popular datasets KITTI, Waymo, and nuScenes. We show that the proposed DALI framework achieves state-of-the-art results and outperforms leading approaches on most of the domain adaptation tasks. Our code is available at \href{https://github.com/xiaohulugo/T-RO2024-DALI}{https://github.com/xiaohulugo/T-RO2024-DALI}.

DALI: Domain Adaptive LiDAR Object Detection via Distribution-level and Instance-level Pseudo Label Denoising

TL;DR

This work tackles the problem of domain shift in LiDAR-based 3D object detection by addressing noise in pseudo labels used in unsupervised domain adaptation. It introduces DALI, a two-pronged approach combining Post-training Size Normalization (PTSN) to correct distribution-level size bias via an optimal scale , and Pseudo Point Clouds Generation (PPCG) with ray-constrained and constraint-free variants to reduce instance-level misalignment. Across KITTI, Waymo, and nuScenes, and with backbones like SECOND-IoU and PV-RCNN, DALI achieves state-of-the-art performance on multiple cross-domain tasks, demonstrating strong cross-domain and within-domain robustness. The method offers a practical, geometry-driven augmentation to refine pseudo labels without extra target annotations, with future work extending to non-rigid objects and adverse weather conditions.

Abstract

Object detection using LiDAR point clouds relies on a large amount of human-annotated samples when training the underlying detectors' deep neural networks. However, generating 3D bounding box annotation for a large-scale dataset could be costly and time-consuming. Alternatively, unsupervised domain adaptation (UDA) enables a given object detector to operate on a novel new data, with unlabeled training dataset, by transferring the knowledge learned from training labeled \textit{source domain} data to the new unlabeled \textit{target domain}. Pseudo label strategies, which involve training the 3D object detector using target-domain predicted bounding boxes from a pre-trained model, are commonly used in UDA. However, these pseudo labels often introduce noise, impacting performance. In this paper, we introduce the Domain Adaptive LIdar (DALI) object detection framework to address noise at both distribution and instance levels. Firstly, a post-training size normalization (PTSN) strategy is developed to mitigate bias in pseudo label size distribution by identifying an unbiased scale after network training. To address instance-level noise between pseudo labels and corresponding point clouds, two pseudo point clouds generation (PPCG) strategies, ray-constrained and constraint-free, are developed to generate pseudo point clouds for each instance, ensuring the consistency between pseudo labels and pseudo points during training. We demonstrate the effectiveness of our method on the publicly available and popular datasets KITTI, Waymo, and nuScenes. We show that the proposed DALI framework achieves state-of-the-art results and outperforms leading approaches on most of the domain adaptation tasks. Our code is available at \href{https://github.com/xiaohulugo/T-RO2024-DALI}{https://github.com/xiaohulugo/T-RO2024-DALI}.

Paper Structure

This paper contains 15 sections, 1 equation, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) illustrates the distributions of pseudo label volumes across various methods, alongside the ground truth distribution. In (b), an instance-level noise example of pseudo labels is presented. Both (a) and (b) are generated within the context of the nuScenes→KITTI domain adaptation task, utilizing SECOND-IoU as the backbone. Notably, our method, DALI, exhibits the smallest divergence from the ground truth distribution.
  • Figure 2: The framework of our domain adaptive LiDAR object detection method. Given the network G trained on the labeled source samples $(\textbf{P}_\text{S}, \textbf{B}_\text{S})$ and the estimated ground truth mean object size $E_{est}[\text{Size}]$ obtained from SN wang2020train or ROS yang2021st3d, we develop post-training size normalization (PTSN) to address the distribution-level noise by selecting the optimal unbiased scale that makes $E_{pred}[\text{Size}] \approx E_{est}[\text{Size}]$. Then the pseudo bounding boxes $\hat{\textbf{B}}$ are generated accordingly, and a pseudo point clouds generation (PPCG) strategy is proposed to address the instance-level noise of $\hat{\textbf{B}}$ by generating two types of pseudo points $\hat{\textbf{P}}_{\text{RC}}$ and $\hat{\textbf{P}}_{\text{CF}}$ for $\hat{\textbf{B}}$ based on a 3D model library and a LiDAR sensor library. Finally, $(\textbf{P}_\text{S}, \textbf{B}_\text{S})$, $(\hat{\textbf{P}}_{\text{RC}}, \hat{\textbf{B}})$, and $(\hat{\textbf{P}}_{\text{CF}}, \hat{\textbf{B}})$ are utilized to train a new network which is able to perform well in both source domain and target domain.
  • Figure 3: Volume of $E_{pred}[\text{Size}](s)$ of the car class on KITTI val of different scales given the SECOND-IoU yang2021st3d network pre-trained on Waymo dataset.
  • Figure 4: Illustration of our pseudo point clouds generation pipeline. For each target bounding box after PTSN, we first search the 3D model library for the best-fitted 3D model (CAD or points). Then this 3D model is aligned with the bounding box, and 3D ray tracing is applied to generate ray-constrained and constrain-free pseudo point clouds. Black and blue points denote the raw and augmented point clouds, respectively. CAD and points 3D models are represented by the gray meshes and red points, respectively.
  • Figure 5: Some examples of the CAD-based 3D model and point-based 3D model used in our method.
  • ...and 5 more figures