Table of Contents
Fetching ...

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

TL;DR

The paper tackles the challenge of sim-to-real transfer in 3D object detection by proposing CTS, a two-stage, mean-teacher framework that complements a fixed-size anchor head and RoI augmentation with a corner-format aleatoric uncertainty representation. This combination enables high-quality pseudo-labels and robust, uncertainty-guided data sampling in the target domain, improving performance over real-to-real-optimized baselines on sim-to-real tasks. Key contributions include fixed-size anchors to prevent size-bias propagation, RoI-based augmentation to diversify feature representations, a uniform corner-based AU formulation, and two AU-driven sampling strategies within a noise-aware mean-teacher setup. Experiments across CARLA3D, KITTI, Lyft, and TinySUScape demonstrate notable gains in AP_BEV and AP_3D compared to baselines, approaching but not yet matching Oracle supervision, and highlight the method's potential for broader sim-to-real and multi-category domain adaptation.

Abstract

Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real),cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

TL;DR

The paper tackles the challenge of sim-to-real transfer in 3D object detection by proposing CTS, a two-stage, mean-teacher framework that complements a fixed-size anchor head and RoI augmentation with a corner-format aleatoric uncertainty representation. This combination enables high-quality pseudo-labels and robust, uncertainty-guided data sampling in the target domain, improving performance over real-to-real-optimized baselines on sim-to-real tasks. Key contributions include fixed-size anchors to prevent size-bias propagation, RoI-based augmentation to diversify feature representations, a uniform corner-based AU formulation, and two AU-driven sampling strategies within a noise-aware mean-teacher setup. Experiments across CARLA3D, KITTI, Lyft, and TinySUScape demonstrate notable gains in AP_BEV and AP_3D compared to baselines, approaching but not yet matching Oracle supervision, and highlight the method's potential for broader sim-to-real and multi-category domain adaptation.

Abstract

Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real),cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.
Paper Structure (28 sections, 6 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 6 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: An illustration of unsupervised sim-to-real domain adaptation guided by pseudo-label, which aims to minimize domain shifts arising from the simulator (e.g., CARLAdosovitskiyCARLAOpenUrban2017) to the real-world datasets (e.g., KITTIgeigerAreWeReady2012, LyftkestenLyftLevelPerception2019 and TinySUSCapedingJstJointSelftraining2022).
  • Figure 2: An illustration of the CTS framework. In the first stage, the model is trained on the source domain with Anchor Head (Sec \ref{['sec:anchor_head']}), RoI Augmentation (Sec \ref{['sec:roi_augmentation']}) and corner-format AU modeling (Sec \ref{['sec:detection_au']}). In the second stage, the noise-aware mean teacher approach is applied: the student model is alternatively supervised with pseudo-labels on the target domain and ground-truth labels on the source domain; the teacher model's weights are updated using the EMA. Meanwhile, two noise-aware sampling strategies (Sec \ref{['sec:noise-aware sampling']}) are implemented using the aleatoric uncertainty indicator: frame-level sampling removes noisy frames, while object-level soft-sampling handles noisy labels.
  • Figure 3: An illustration of two coding schemes of bounding boxes with uncertainties. (a) BF: box format; (b) CF: corner format, where the red areas stand for the potential ranges, that is, the aleatoric uncertainty.
  • Figure 4: An illustration of the car sizes distribution of LyftkestenLyftLevelPerception2019, and CARLA3D datasets with different processing methods, i.e., SNwangTrainGermanyTest2020 and Random Scaling.
  • Figure 5: An illustration of the correlation between AU value and IoU/ego-to-object distance for the target dataset. Blue points denote the AU values of detected objects; the red line represents the means of the AU values.
  • ...and 1 more figures