Table of Contents
Fetching ...

Revisiting Cross-Domain Problem for LiDAR-based 3D Object Detection

Ruixiao Zhang, Juheon Lee, Xiaohao Cai, Adam Prugel-Bennett

TL;DR

This work tackles the problem of cross-domain generalization in LiDAR-based 3D object detection for autonomous driving, revealing that state-of-the-art models overfit to their source domains and struggle to adapt to new datasets. It evaluates representative LiDAR-only and multi-modal detectors, as well as self-training approaches, across KITTI, Waymo, and nuScenes, and introduces side-view AP and front-view AP to diagnose domain-related errors. The study finds pervasive cross-domain drops across architectures, highlights the difficulty of adapting multi-modal models due to data and calibration inconsistencies, and shows that self-training methods like ST3D can shift the model’s knowledge distribution rather than truly improving generalization. By analyzing per-dimension overlaps and proposing new evaluation metrics, the paper provides practical guidance for designing more robust cross-domain 3D detectors and highlights the need for domain-aware training strategies and evaluation protocols that reflect real-world sensor variations.

Abstract

Deep learning models such as convolutional neural networks and transformers have been widely applied to solve 3D object detection problems in the domain of autonomous driving. While existing models have achieved outstanding performance on most open benchmarks, the generalization ability of these deep networks is still in doubt. To adapt models to other domains including different cities, countries, and weather, retraining with the target domain data is currently necessary, which hinders the wide application of autonomous driving. In this paper, we deeply analyze the cross-domain performance of the state-of-the-art models. We observe that most models will overfit the training domains and it is challenging to adapt them to other domains directly. Existing domain adaptation methods for 3D object detection problems are actually shifting the models' knowledge domain instead of improving their generalization ability. We then propose additional evaluation metrics -- the side-view and front-view AP -- to better analyze the core issues of the methods' heavy drops in accuracy levels. By using the proposed metrics and further evaluating the cross-domain performance in each dimension, we conclude that the overfitting problem happens more obviously on the front-view surface and the width dimension which usually faces the sensor and has more 3D points surrounding it. Meanwhile, our experiments indicate that the density of the point cloud data also significantly influences the models' cross-domain performance.

Revisiting Cross-Domain Problem for LiDAR-based 3D Object Detection

TL;DR

This work tackles the problem of cross-domain generalization in LiDAR-based 3D object detection for autonomous driving, revealing that state-of-the-art models overfit to their source domains and struggle to adapt to new datasets. It evaluates representative LiDAR-only and multi-modal detectors, as well as self-training approaches, across KITTI, Waymo, and nuScenes, and introduces side-view AP and front-view AP to diagnose domain-related errors. The study finds pervasive cross-domain drops across architectures, highlights the difficulty of adapting multi-modal models due to data and calibration inconsistencies, and shows that self-training methods like ST3D can shift the model’s knowledge distribution rather than truly improving generalization. By analyzing per-dimension overlaps and proposing new evaluation metrics, the paper provides practical guidance for designing more robust cross-domain 3D detectors and highlights the need for domain-aware training strategies and evaluation protocols that reflect real-world sensor variations.

Abstract

Deep learning models such as convolutional neural networks and transformers have been widely applied to solve 3D object detection problems in the domain of autonomous driving. While existing models have achieved outstanding performance on most open benchmarks, the generalization ability of these deep networks is still in doubt. To adapt models to other domains including different cities, countries, and weather, retraining with the target domain data is currently necessary, which hinders the wide application of autonomous driving. In this paper, we deeply analyze the cross-domain performance of the state-of-the-art models. We observe that most models will overfit the training domains and it is challenging to adapt them to other domains directly. Existing domain adaptation methods for 3D object detection problems are actually shifting the models' knowledge domain instead of improving their generalization ability. We then propose additional evaluation metrics -- the side-view and front-view AP -- to better analyze the core issues of the methods' heavy drops in accuracy levels. By using the proposed metrics and further evaluating the cross-domain performance in each dimension, we conclude that the overfitting problem happens more obviously on the front-view surface and the width dimension which usually faces the sensor and has more 3D points surrounding it. Meanwhile, our experiments indicate that the density of the point cloud data also significantly influences the models' cross-domain performance.
Paper Structure (13 sections, 1 equation, 3 figures, 6 tables)

This paper contains 13 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: LiDAR point cloud and image data from three datasets: KITTI 6248074, Waymo Sun_2020_CVPR and nuScenes 9156412. Point cloud density and image shapes are different due to different sensor equipment. For the car objects close to the sensors, a large number of points are collected and most shapes are clearly visible. While for those away from the sensors, only a few points are collected and it is difficult to estimate the dimensions.
  • Figure 2: Definition of the side-view and the front-view AP. The red and blue boxes denote the ground truth and predictions. We project not only the related side to the front/side 2D plane but also consider the other sides that actually can be seen in the related view. For example, the left side is also considered when making a projection into the front view.
  • Figure 3: Performance comparison of the source-only, ROS and the best ST3D (Waymo--KITTI) models on the source domain (Waymo). The results indicate that, with the improvement of the detection ability on the target domain, the performance of ST3D models on the source domain drops significantly.