Domain Generalization of 3D Object Detection by Density-Resampling

Shuangzhi Li; Lei Ma; Xingyu Li

Domain Generalization of 3D Object Detection by Density-Resampling

Shuangzhi Li, Lei Ma, Xingyu Li

TL;DR

This paper tackles the challenge of single-domain generalization for LiDAR-based 3D object detection by addressing domain shifts arising from point-density variations and sensor differences. It introduces physical-aware density-resampling data augmentation (PDDA) to simulate realistic density patterns, and a multi-task learning framework that couples standard detection with a self-supervised 3D scene restoration task to improve scene understanding. Additionally, it proposes a test-time adaptation strategy that uses the restoration objective to fine-tune the encoder on unseen target domains, further bridging domain gaps. Across cross-dataset evaluations on Car, Pedestrian, and Cyclist detections, the method consistently outperforms state-of-the-art DG approaches and, in some cases, even surpasses unsupervised domain adaptation methods, demonstrating strong practical impact for robust 3D perception in real-world, heterogeneous sensing conditions.

Abstract

Point-cloud-based 3D object detection suffers from performance degradation when encountering data with novel domain gaps. To tackle it, the single-domain generalization (SDG) aims to generalize the detection model trained in a limited single source domain to perform robustly on unexplored domains. In this paper, we propose an SDG method to improve the generalizability of 3D object detection to unseen target domains. Unlike prior SDG works for 3D object detection solely focusing on data augmentation, our work introduces a novel data augmentation method and contributes a new multi-task learning strategy in the methodology. Specifically, from the perspective of data augmentation, we design a universal physical-aware density-based data augmentation (PDDA) method to mitigate the performance loss stemming from diverse point densities. From the learning methodology viewpoint, we develop a multi-task learning for 3D object detection: during source training, besides the main standard detection task, we leverage an auxiliary self-supervised 3D scene restoration task to enhance the comprehension of the encoder on background and foreground details for better recognition and detection of objects. Furthermore, based on the auxiliary self-supervised task, we propose the first test-time adaptation method for domain generalization of 3D object detection, which efficiently adjusts the encoder's parameters to adapt to unseen target domains during testing time, to further bridge domain gaps. Extensive cross-dataset experiments covering "Car", "Pedestrian", and "Cyclist" detections, demonstrate our method outperforms state-of-the-art SDG methods and even overpass unsupervised domain adaptation methods under some circumstances.

Domain Generalization of 3D Object Detection by Density-Resampling

TL;DR

Abstract

Paper Structure (20 sections, 7 equations, 6 figures, 10 tables)

This paper contains 20 sections, 7 equations, 6 figures, 10 tables.

Introduction
Relate Work
Point-Cloud-Based 3D Object Detection
Domain Generalization on 2D/3D Object Detection
Test-Time Adaptation in Domain Generalization
Problem Formulation
Methodology
Physical-Aware Density-Resampling Data Augmentation
Multi-Task Learning with Density-Resampling
Test-Time Adaptation with Self-Supervised 3D Scene Restoration
Experiment
Experiment Settings
Comparison with SOTA Methods
Ablation Study
Limitation
...and 5 more sections

Figures (6)

Figure 1: Detection results w.r.t. Waymo $\rightarrow$ NuScenes, where the red boxes are ground-truth 3D boxes and green ones are detected 3D boxes. Our method achieves better performance than other SDG methods, PA-DAchoi2021part and 3D-VFlehner20223d, and even UDA method SNwang2020train (refers to Table \ref{['tab: main_comparision_DG_UDA']} for statistical details).
Figure 2: Intra-domain detection by VoxelRCNN deng2021voxel with voxel-based backbone on NuScenes. Due to the blockage by front objects and various distances, cars with sparse scanning are hard to detect. (red boxes are ground-truth 3D boxes and green ones are detected 3D boxes)
Figure 3: Pipeline of our proposed DG method. During training on the source domain, the training sample is augmented with density re-sampling, which is then used to train the multi-task model for (a) standard detection and (b) 3D scene restoration from its down-sampled version. During Testing on the target domain, given a query data, self-supervised scene restoration is conducted on the corresponding density-downsampled version for lightweight model update. Then the updated encoder works together with the frozen detection head for the final prediction. In this figure, source and target samples are from NuScenes caesar2020nuscenes and KITTI geiger2013vision, respectively.
Figure 4: Computation efficiency on (a) NuScenes $\rightarrow$ KITTI and (b) Waymo $\rightarrow$ NuScenes. We indicate Waymo's frame rate of 10 FPS and NuScenes's keyframe rate of 2 FPS by dash lines.
Figure 5: Visualization of density down-sampling.
...and 1 more figures

Domain Generalization of 3D Object Detection by Density-Resampling

TL;DR

Abstract

Domain Generalization of 3D Object Detection by Density-Resampling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)