Revisiting Cross-Domain Problem for LiDAR-based 3D Object Detection
Ruixiao Zhang, Juheon Lee, Xiaohao Cai, Adam Prugel-Bennett
TL;DR
This work tackles the problem of cross-domain generalization in LiDAR-based 3D object detection for autonomous driving, revealing that state-of-the-art models overfit to their source domains and struggle to adapt to new datasets. It evaluates representative LiDAR-only and multi-modal detectors, as well as self-training approaches, across KITTI, Waymo, and nuScenes, and introduces side-view AP and front-view AP to diagnose domain-related errors. The study finds pervasive cross-domain drops across architectures, highlights the difficulty of adapting multi-modal models due to data and calibration inconsistencies, and shows that self-training methods like ST3D can shift the model’s knowledge distribution rather than truly improving generalization. By analyzing per-dimension overlaps and proposing new evaluation metrics, the paper provides practical guidance for designing more robust cross-domain 3D detectors and highlights the need for domain-aware training strategies and evaluation protocols that reflect real-world sensor variations.
Abstract
Deep learning models such as convolutional neural networks and transformers have been widely applied to solve 3D object detection problems in the domain of autonomous driving. While existing models have achieved outstanding performance on most open benchmarks, the generalization ability of these deep networks is still in doubt. To adapt models to other domains including different cities, countries, and weather, retraining with the target domain data is currently necessary, which hinders the wide application of autonomous driving. In this paper, we deeply analyze the cross-domain performance of the state-of-the-art models. We observe that most models will overfit the training domains and it is challenging to adapt them to other domains directly. Existing domain adaptation methods for 3D object detection problems are actually shifting the models' knowledge domain instead of improving their generalization ability. We then propose additional evaluation metrics -- the side-view and front-view AP -- to better analyze the core issues of the methods' heavy drops in accuracy levels. By using the proposed metrics and further evaluating the cross-domain performance in each dimension, we conclude that the overfitting problem happens more obviously on the front-view surface and the width dimension which usually faces the sensor and has more 3D points surrounding it. Meanwhile, our experiments indicate that the density of the point cloud data also significantly influences the models' cross-domain performance.
