Table of Contents
Fetching ...

Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation

Wenlve Zhou, Zhiheng Zhou, Tianlei Wang, Delu Zeng

TL;DR

This paper addresses distribution shifts in unsupervised domain adaptation for semantic segmentation arising from cross-domain mixed sampling. It introduces Guidance Training, featuring a lightweight Guider module that sits between the encoder and decoder to reconstruct target-domain feature distributions from mixed data and to predict pseudo-labels, optimized via a target-guidance loss and an adaptive uncertainty mechanism. The overall objective combines the DACS mix loss with the Guidance Loss, enabling pseudo-label recovery while preserving real-world semantics and adding negligible inference cost. Empirically, Guidance Training provides consistent improvements across GTA→Cityscape and SYNTHIA→Cityscape benchmarks when integrated with multiple UDA baselines, with thorough ablations confirming robust, hardware-efficient design and clear guidance from the target domain.

Abstract

Unsupervised Domain Adaptation (UDA) endeavors to adjust models trained on a source domain to perform well on a target domain without requiring additional annotations. In the context of domain adaptive semantic segmentation, which tackles UDA for dense prediction, the goal is to circumvent the need for costly pixel-level annotations. Typically, various prevailing methods baseline rely on constructing intermediate domains via cross-domain mixed sampling techniques to mitigate the performance decline caused by domain gaps. However, such approaches generate synthetic data that diverge from real-world distributions, potentially leading the model astray from the true target distribution. To address this challenge, we propose a novel auxiliary task called Guidance Training. This task facilitates the effective utilization of cross-domain mixed sampling techniques while mitigating distribution shifts from the real world. Specifically, Guidance Training guides the model to extract and reconstruct the target-domain feature distribution from mixed data, followed by decoding the reconstructed target-domain features to make pseudo-label predictions. Importantly, integrating Guidance Training incurs minimal training overhead and imposes no additional inference burden. We demonstrate the efficacy of our approach by integrating it with existing methods, consistently improving performance. The implementation will be available at https://github.com/Wenlve-Zhou/Guidance-Training.

Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation

TL;DR

This paper addresses distribution shifts in unsupervised domain adaptation for semantic segmentation arising from cross-domain mixed sampling. It introduces Guidance Training, featuring a lightweight Guider module that sits between the encoder and decoder to reconstruct target-domain feature distributions from mixed data and to predict pseudo-labels, optimized via a target-guidance loss and an adaptive uncertainty mechanism. The overall objective combines the DACS mix loss with the Guidance Loss, enabling pseudo-label recovery while preserving real-world semantics and adding negligible inference cost. Empirically, Guidance Training provides consistent improvements across GTA→Cityscape and SYNTHIA→Cityscape benchmarks when integrated with multiple UDA baselines, with thorough ablations confirming robust, hardware-efficient design and clear guidance from the target domain.

Abstract

Unsupervised Domain Adaptation (UDA) endeavors to adjust models trained on a source domain to perform well on a target domain without requiring additional annotations. In the context of domain adaptive semantic segmentation, which tackles UDA for dense prediction, the goal is to circumvent the need for costly pixel-level annotations. Typically, various prevailing methods baseline rely on constructing intermediate domains via cross-domain mixed sampling techniques to mitigate the performance decline caused by domain gaps. However, such approaches generate synthetic data that diverge from real-world distributions, potentially leading the model astray from the true target distribution. To address this challenge, we propose a novel auxiliary task called Guidance Training. This task facilitates the effective utilization of cross-domain mixed sampling techniques while mitigating distribution shifts from the real world. Specifically, Guidance Training guides the model to extract and reconstruct the target-domain feature distribution from mixed data, followed by decoding the reconstructed target-domain features to make pseudo-label predictions. Importantly, integrating Guidance Training incurs minimal training overhead and imposes no additional inference burden. We demonstrate the efficacy of our approach by integrating it with existing methods, consistently improving performance. The implementation will be available at https://github.com/Wenlve-Zhou/Guidance-Training.
Paper Structure (17 sections, 17 equations, 4 figures, 10 tables)

This paper contains 17 sections, 17 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: (a) The ground truth is randomly sampled to generate a binary mask, which serves as the basis for generating a hybrid image and its corresponding annotations through a copy-paste mechanism. (b) DACS-generated samples stray from physical norms, highlighting contextually deficient areas with red delineations. (c) Building upon DACS, we introduce the Guider module to guide the model in predicting the pseudo-labels of the original target image.
  • Figure 2: Guidance Training implemented with DACS ref9. (a) Building upon the DACS pipeline, we incorporate the Guider between the encoder and decoder, steering model training via Guidance Loss, thereby constituting Guidance Training. (b) The primary role of the Guider is to aid the model in predicting pseudo-labels based on the hybrid features $E(x^{m})$, prompting the model to transition from the hybrid feature distribution to decouple it from the real-world feature distribution. The Guider is exclusively employed during training and therefore does not amplify model inference overhead.
  • Figure 3: Qualitative comparison of Guidance Training with MICref55 on GTA$\rightarrow$Cityscape. We combine Guidance Training with MIC ref55 and find, through visualization, that Guidance Training prevents the model from deviating from the physical world distribution. For details of the analysis, refer to Section IV.E.
  • Figure 4: Visual examples of mixed data prediction $D(E(x^m))$ and pseudo-label prediction based on mixed feature $D(G(E(x^m),M))$.