Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu
TL;DR
The paper tackles the generalization gap in LiDAR-based 3D object detection when deployed to unseen domains with different sensor configurations and scene distributions. It introduces sparsity-invariant feature learning by downsampling source point clouds to varied beam densities using detector-driven confidence, implemented within a teacher-student BEV framework that applies Feature Content Alignment ($L_{FCA}$) and Graph-based Embedding Relationship Alignment (GERA) to learn domain-agnostic representations. The approach optimizes $ ext{L}_{ ext{overall}} = ext{L}_{ ext{det}} + oldsymbol{b1} ext{L}_{ ext{FCA}} + oldsymbol{b2} ext{L}_{ ext{GERA}}$, and leverages a Gromov-Wasserstein-based loss to preserve high-level proposal relationships across densities. Experiments on Waymo, KITTI, and nuScenes show superior generalization to unseen domains and compatibility with domain adaptation methods, sometimes matching or exceeding target-domain baselines, thereby reducing reliance on multi-domain labeled data. This work advances practical robustness for autonomous driving by enabling a single-domain-trained detector to operate effectively across diverse LiDAR configurations and environments.
Abstract
In autonomous driving, 3D object detection is essential for accurately identifying and tracking objects. Despite the continuous development of various technologies for this task, a significant drawback is observed in most of them-they experience substantial performance degradation when detecting objects in unseen domains. In this paper, we propose a method to improve the generalization ability for 3D object detection on a single domain. We primarily focus on generalizing from a single source domain to target domains with distinct sensor configurations and scene distributions. To learn sparsity-invariant features from a single source domain, we selectively subsample the source data to a specific beam, using confidence scores determined by the current detector to identify the density that holds utmost importance for the detector. Subsequently, we employ the teacher-student framework to align the Bird's Eye View (BEV) features for different point clouds densities. We also utilize feature content alignment (FCA) and graph-based embedding relationship alignment (GERA) to instruct the detector to be domain-agnostic. Extensive experiments demonstrate that our method exhibits superior generalization capabilities compared to other baselines. Furthermore, our approach even outperforms certain domain adaptation methods that can access to the target domain data.
