CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Meiying Zhang; Weiyuan Peng; Guangyao Ding; Chenyang Lei; Chunlin Ji; Qi Hao

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

TL;DR

The paper tackles the challenge of sim-to-real transfer in 3D object detection by proposing CTS, a two-stage, mean-teacher framework that complements a fixed-size anchor head and RoI augmentation with a corner-format aleatoric uncertainty representation. This combination enables high-quality pseudo-labels and robust, uncertainty-guided data sampling in the target domain, improving performance over real-to-real-optimized baselines on sim-to-real tasks. Key contributions include fixed-size anchors to prevent size-bias propagation, RoI-based augmentation to diversify feature representations, a uniform corner-based AU formulation, and two AU-driven sampling strategies within a noise-aware mean-teacher setup. Experiments across CARLA3D, KITTI, Lyft, and TinySUScape demonstrate notable gains in AP_BEV and AP_3D compared to baselines, approaching but not yet matching Oracle supervision, and highlight the method's potential for broader sim-to-real and multi-category domain adaptation.

Abstract

Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real),cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 6 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Related Work
UDA for 3D object detection
Uncertainty Estimation in 3D Object Detection
System Setup
Student Model
Teacher Model
Proposed Methods
Enhancement of Pseudo-Label Quality
Anchor Head (AH)
RoI Random Scaling (RRS) and Augmentation
3D Detection with Aleatoric Uncertainty
Noise-aware Mean Teacher
Object-Level Soft Sampling
Frame-Level Sampling
...and 13 more sections

Figures (6)

Figure 1: An illustration of unsupervised sim-to-real domain adaptation guided by pseudo-label, which aims to minimize domain shifts arising from the simulator (e.g., CARLAdosovitskiyCARLAOpenUrban2017) to the real-world datasets (e.g., KITTIgeigerAreWeReady2012, LyftkestenLyftLevelPerception2019 and TinySUSCapedingJstJointSelftraining2022).
Figure 2: An illustration of the CTS framework. In the first stage, the model is trained on the source domain with Anchor Head (Sec \ref{['sec:anchor_head']}), RoI Augmentation (Sec \ref{['sec:roi_augmentation']}) and corner-format AU modeling (Sec \ref{['sec:detection_au']}). In the second stage, the noise-aware mean teacher approach is applied: the student model is alternatively supervised with pseudo-labels on the target domain and ground-truth labels on the source domain; the teacher model's weights are updated using the EMA. Meanwhile, two noise-aware sampling strategies (Sec \ref{['sec:noise-aware sampling']}) are implemented using the aleatoric uncertainty indicator: frame-level sampling removes noisy frames, while object-level soft-sampling handles noisy labels.
Figure 3: An illustration of two coding schemes of bounding boxes with uncertainties. (a) BF: box format; (b) CF: corner format, where the red areas stand for the potential ranges, that is, the aleatoric uncertainty.
Figure 4: An illustration of the car sizes distribution of LyftkestenLyftLevelPerception2019, and CARLA3D datasets with different processing methods, i.e., SNwangTrainGermanyTest2020 and Random Scaling.
Figure 5: An illustration of the correlation between AU value and IoU/ego-to-object distance for the target dataset. Blue points denote the AU values of detected objects; the red line represents the means of the AU values.
...and 1 more figures

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

TL;DR

Abstract

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)