CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection
Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao
TL;DR
The paper tackles the challenge of sim-to-real transfer in 3D object detection by proposing CTS, a two-stage, mean-teacher framework that complements a fixed-size anchor head and RoI augmentation with a corner-format aleatoric uncertainty representation. This combination enables high-quality pseudo-labels and robust, uncertainty-guided data sampling in the target domain, improving performance over real-to-real-optimized baselines on sim-to-real tasks. Key contributions include fixed-size anchors to prevent size-bias propagation, RoI-based augmentation to diversify feature representations, a uniform corner-based AU formulation, and two AU-driven sampling strategies within a noise-aware mean-teacher setup. Experiments across CARLA3D, KITTI, Lyft, and TinySUScape demonstrate notable gains in AP_BEV and AP_3D compared to baselines, approaching but not yet matching Oracle supervision, and highlight the method's potential for broader sim-to-real and multi-category domain adaptation.
Abstract
Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real),cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.
