Table of Contents
Fetching ...

Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

Zhimin Yuan, Wankang Zeng, Yanfei Su, Weiquan Liu, Ming Cheng, Yulan Guo, Cheng Wang

TL;DR

This work tackles the challenge of 3D synthetic-to-real unsupervised domain adaptive segmentation by addressing two core gaps: input-level density differences and poor initialization for self-training. It introduces a non-learnable density-guided translator (DGT) to align point density across domains and a two-stage pipeline (DGT-ST) that first uses a prototype-guided category-level adversarial network (PCAN) for a strong initialization, followed by source-aware consistency LaserMix (SAC-LM) within a mean-teacher framework to refine domain-invariant features. The approach yields substantial improvements on two synthetic-to-real benchmarks, achieving up to 9.4% and 4.3% gains in mean IoU compared to state-of-the-art baselines, and demonstrates strong performance on both SemanticKITTI and SemanticPOSS targets. The combination of input-level density alignment, prototype-informed adversarial alignment, and consistency-based self-training provides a practical and effective pathway for robust 3D UDA in real-world LiDAR applications.

Abstract

3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between domains, and integrates it into a two-stage self-training pipeline named DGT-ST. First, in contrast to existing works that simultaneously conduct data generation and feature/output alignment within unstable adversarial training, we employ the non-learnable DGT to bridge the domain gap at the input level. Second, to provide a well-initialized model for self-training, we propose a category-level adversarial network in stage one that utilizes the prototype to prevent negative transfer. Finally, by leveraging the designs above, a domain-mixed self-training method with source-aware consistency loss is proposed in stage two to narrow the domain gap further. Experiments on two synthetic-to-real segmentation tasks (SynLiDAR $\rightarrow$ semanticKITTI and SynLiDAR $\rightarrow$ semanticPOSS) demonstrate that DGT-ST outperforms state-of-the-art methods, achieving 9.4$\%$ and 4.3$\%$ mIoU improvements, respectively. Code is available at \url{https://github.com/yuan-zm/DGT-ST}.

Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

TL;DR

This work tackles the challenge of 3D synthetic-to-real unsupervised domain adaptive segmentation by addressing two core gaps: input-level density differences and poor initialization for self-training. It introduces a non-learnable density-guided translator (DGT) to align point density across domains and a two-stage pipeline (DGT-ST) that first uses a prototype-guided category-level adversarial network (PCAN) for a strong initialization, followed by source-aware consistency LaserMix (SAC-LM) within a mean-teacher framework to refine domain-invariant features. The approach yields substantial improvements on two synthetic-to-real benchmarks, achieving up to 9.4% and 4.3% gains in mean IoU compared to state-of-the-art baselines, and demonstrates strong performance on both SemanticKITTI and SemanticPOSS targets. The combination of input-level density alignment, prototype-informed adversarial alignment, and consistency-based self-training provides a practical and effective pathway for robust 3D UDA in real-world LiDAR applications.

Abstract

3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between domains, and integrates it into a two-stage self-training pipeline named DGT-ST. First, in contrast to existing works that simultaneously conduct data generation and feature/output alignment within unstable adversarial training, we employ the non-learnable DGT to bridge the domain gap at the input level. Second, to provide a well-initialized model for self-training, we propose a category-level adversarial network in stage one that utilizes the prototype to prevent negative transfer. Finally, by leveraging the designs above, a domain-mixed self-training method with source-aware consistency loss is proposed in stage two to narrow the domain gap further. Experiments on two synthetic-to-real segmentation tasks (SynLiDAR semanticKITTI and SynLiDAR semanticPOSS) demonstrate that DGT-ST outperforms state-of-the-art methods, achieving 9.4 and 4.3 mIoU improvements, respectively. Code is available at \url{https://github.com/yuan-zm/DGT-ST}.
Paper Structure (17 sections, 15 equations, 7 figures, 9 tables)

This paper contains 17 sections, 15 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Distinct sampling patterns between synthetic and real-world scans. The synthetic scan (upper left) is integral and clean, whereas the real-world data (upper and middle right) contains unexpected and irregular noise. DGT enhances the realism of synthetic scan (middle left). Point densities of three datasets at various distances from the LiDAR center are shown at the bottom.
  • Figure 1: Comparison of synthetic and real-world scans. (a) and (d) show one scan of SynLiDAR and SemanticKITTI, respectively. (b) and (e) are zoomed-in visualizations of the road in the black box shown in (a) and (d). (c) and (f) are side-view visualizations of part of (a) and (d). The red boxes in (c) and (f) highlight that the points of synthetic and real-world scans do not exhibit significant shifts along the Z-axis.
  • Figure 2: Overview of our two-stage DGT-ST. We propose DGT to bridge the domain gap at the input level and be integrated into both stages. In stage one, we propose PCAN with a segmentor $G$ and a discriminator $D$. We take the target-like source $x^{s \rightarrow t}$ and raw target data $x^{t}$ as input to perform the category-level adversarial alignment. In stage two, SAC-LM, a teacher-student learning architecture is employed and loads the stage one trained model. We use LaserMix kong2023lasermix to mix two domain scans $x^{s \rightarrow t}$ and $x^{t}$ and obtain the mixed scan $x^{st}$. Finally, the student model is trained by $x^{s \rightarrow t}$ and $x^{st}$. Moreover, we enforce the student model to give consistent predictions on $x^{t}$ and $x^{t \rightarrow s}$.
  • Figure 2: t-SNE visualization of the embedded features on Syn $\rightarrow$ Sk.
  • Figure 3: Illustration of the density-guided translator (DGT).
  • ...and 2 more figures