Table of Contents
Fetching ...

UniDA3D: A Unified Domain-Adaptive Framework for Multi-View 3D Object Detection

Hongjing Wu, Cheng Chi, Jinlin Wu, Yanzhao Su, Zhen Lei, Wenqi Ren

Abstract

Camera-only 3D object detection is critical for autonomous driving, offering a cost-effective alternative to LiDAR based methods. In particular, multi-view 3D object detection has emerged as a promising direction due to its balanced trade-off between performance and cost. However, existing methods often suffer significant performance degradation under complex environmental conditions such as nighttime, fog, and rain, primarily due to their reliance on training data collected mostly in ideal conditions. To address this challenge, we propose UniDA3D, a unified domain-adaptive multi-view 3D object detector designed for robust perception under diverse adverse conditions. UniDA3D formulates nighttime, rainy, and foggy scenes as a unified multi target domain adaptation problem and leverages a novel query guided domain discrepancy mitigation (QDDM) module to align object features between source and target domains at both batch and global levels via query-centric adversarial and contrastive learning. Furthermore, we introduce a domain-adaptive teacher student training pipeline with an exponential-moving-average teacher and dynamically updated high-quality pseudo labels to enhance consistency learning and suppress background noise in unlabeled target domains. In contrast to prior approaches that require separate training for each condition, UniDA3D performs a single unified training process across multiple domains, enabling robust all-weather 3D perception. On a synthesized multi-view 3D benchmark constructed by generating nighttime, rainy, and foggy counterparts from nuScenes (nuScenes-Night, nuScenes-Rain, and nuScenes-Haze), UniDA3D consistently outperforms state of-the-art camera-only multi-view 3D detectors under extreme conditions, achieving substantial gains in mAP and NDS while maintaining real-time inference efficiency.

UniDA3D: A Unified Domain-Adaptive Framework for Multi-View 3D Object Detection

Abstract

Camera-only 3D object detection is critical for autonomous driving, offering a cost-effective alternative to LiDAR based methods. In particular, multi-view 3D object detection has emerged as a promising direction due to its balanced trade-off between performance and cost. However, existing methods often suffer significant performance degradation under complex environmental conditions such as nighttime, fog, and rain, primarily due to their reliance on training data collected mostly in ideal conditions. To address this challenge, we propose UniDA3D, a unified domain-adaptive multi-view 3D object detector designed for robust perception under diverse adverse conditions. UniDA3D formulates nighttime, rainy, and foggy scenes as a unified multi target domain adaptation problem and leverages a novel query guided domain discrepancy mitigation (QDDM) module to align object features between source and target domains at both batch and global levels via query-centric adversarial and contrastive learning. Furthermore, we introduce a domain-adaptive teacher student training pipeline with an exponential-moving-average teacher and dynamically updated high-quality pseudo labels to enhance consistency learning and suppress background noise in unlabeled target domains. In contrast to prior approaches that require separate training for each condition, UniDA3D performs a single unified training process across multiple domains, enabling robust all-weather 3D perception. On a synthesized multi-view 3D benchmark constructed by generating nighttime, rainy, and foggy counterparts from nuScenes (nuScenes-Night, nuScenes-Rain, and nuScenes-Haze), UniDA3D consistently outperforms state of-the-art camera-only multi-view 3D detectors under extreme conditions, achieving substantial gains in mAP and NDS while maintaining real-time inference efficiency.

Paper Structure

This paper contains 15 sections, 11 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the proposed UniDA3D. UniDA3D first synthesizes nighttime, rainy, and foggy multi-view images to assist training. A domain-adaptive teacher--student self-training pipeline is employed to transfer knowledge from the source domain to the target domains. UniDA3D is further equipped with a query-guided domain discrepancy mitigation (QDDM) module to align object-level features across domains at both batch and global scales.
  • Figure 2: Details of the query-guided domain discrepancy mitigation (QDDM) module. QDDM leverages 3D object queries to align source and target features at the object level. It combines adversarial training and contrastive learning with dynamically updated global representation, promoting robust and consistent cross-domain 3D object detection.
  • Figure 3: t-SNE visualization results (Blue point: Source Domain, Red point: Target Domain). The distribution consistency between red and blue points reflects the discrepancy in features extracted by the model. A smaller gap between the distributions indicates stronger domain adaptation capability.