Table of Contents
Fetching ...

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cairong Zhao

TL;DR

Uni$^2$Det tackles cross-dataset domain shifts in LiDAR-based 3D detection by introducing a unified multi-dataset training framework guided by dataset-aware prompts. It implements three prompting modules—voxelization, backbone, and head—that leverage point distribution correction via mean-shifted batch normalization, BEV range masking, and object-conditional residual learning to bridge inter-dataset disparities. The approach enables zero-shot transfer to unseen datasets and outperforms prior MDT methods like Uni3D, demonstrating robust cross-domain generalization across KITTI, Waymo, and nuScenes. The results highlight the practical potential of prompting-based unification for scalable, domain-robust 3D perception in autonomous driving, while acknowledging limitations related to identical category spaces and pointing to future work on broader label-space coverage.

Abstract

We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains. Due to substantial disparities in data distribution and variations in taxonomy across diverse domains, training such a detector by simply merging datasets poses a significant challenge. Motivated by this observation, we introduce multi-stage prompting modules for multi-dataset 3D detection, which leverages prompts based on the characteristics of corresponding datasets to mitigate existing differences. This elegant design facilitates seamless plug-and-play integration within various advanced 3D detection frameworks in a unified manner, while also allowing straightforward adaptation for universal applicability across datasets. Experiments are conducted across multiple dataset consolidation scenarios involving KITTI, Waymo, and nuScenes, demonstrating that our Uni$^2$Det outperforms existing methods by a large margin in multi-dataset training. Notably, results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

TL;DR

UniDet tackles cross-dataset domain shifts in LiDAR-based 3D detection by introducing a unified multi-dataset training framework guided by dataset-aware prompts. It implements three prompting modules—voxelization, backbone, and head—that leverage point distribution correction via mean-shifted batch normalization, BEV range masking, and object-conditional residual learning to bridge inter-dataset disparities. The approach enables zero-shot transfer to unseen datasets and outperforms prior MDT methods like Uni3D, demonstrating robust cross-domain generalization across KITTI, Waymo, and nuScenes. The results highlight the practical potential of prompting-based unification for scalable, domain-robust 3D perception in autonomous driving, while acknowledging limitations related to identical category spaces and pointing to future work on broader label-space coverage.

Abstract

We present UniDet, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains. Due to substantial disparities in data distribution and variations in taxonomy across diverse domains, training such a detector by simply merging datasets poses a significant challenge. Motivated by this observation, we introduce multi-stage prompting modules for multi-dataset 3D detection, which leverages prompts based on the characteristics of corresponding datasets to mitigate existing differences. This elegant design facilitates seamless plug-and-play integration within various advanced 3D detection frameworks in a unified manner, while also allowing straightforward adaptation for universal applicability across datasets. Experiments are conducted across multiple dataset consolidation scenarios involving KITTI, Waymo, and nuScenes, demonstrating that our UniDet outperforms existing methods by a large margin in multi-dataset training. Notably, results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
Paper Structure (28 sections, 5 equations, 5 figures, 6 tables)

This paper contains 28 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of different training paradigms. Single-dataset training leverages separate detectors and heads for different datasets. Naive multi-dataset training conducts point range alignment and partially shares the parameters within detectors, but still with dataset-specific heads. We propose unified and universal training, where detectors and heads for different datasets are fully shared.
  • Figure 2: Illustration of the overall framework of Uni$^2$Det. The multi-stage prompting modules are employed as the core component to make the detection more unified and universal.
  • Figure 3: Illustration of multi-stage prompting modules, including three modules for prompting different components of the detector.
  • Figure 4: Illustration of statistical distribution differences of object size (length, width, and height) in KITTI between the ground truth and the predictions of Uni$^2$Det with and without object-conditional residual learning (OCRL) module.
  • Figure 5: Ablation on the balancing ratio based on PV-RCNN and Voxel-RCNN, based on AP$_{3D}$ metrics of KITTI on the Waymo-KITTI consolidation.