Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation
Wenlve Zhou, Zhiheng Zhou, Tianlei Wang, Delu Zeng
TL;DR
This paper addresses distribution shifts in unsupervised domain adaptation for semantic segmentation arising from cross-domain mixed sampling. It introduces Guidance Training, featuring a lightweight Guider module that sits between the encoder and decoder to reconstruct target-domain feature distributions from mixed data and to predict pseudo-labels, optimized via a target-guidance loss and an adaptive uncertainty mechanism. The overall objective combines the DACS mix loss with the Guidance Loss, enabling pseudo-label recovery while preserving real-world semantics and adding negligible inference cost. Empirically, Guidance Training provides consistent improvements across GTA→Cityscape and SYNTHIA→Cityscape benchmarks when integrated with multiple UDA baselines, with thorough ablations confirming robust, hardware-efficient design and clear guidance from the target domain.
Abstract
Unsupervised Domain Adaptation (UDA) endeavors to adjust models trained on a source domain to perform well on a target domain without requiring additional annotations. In the context of domain adaptive semantic segmentation, which tackles UDA for dense prediction, the goal is to circumvent the need for costly pixel-level annotations. Typically, various prevailing methods baseline rely on constructing intermediate domains via cross-domain mixed sampling techniques to mitigate the performance decline caused by domain gaps. However, such approaches generate synthetic data that diverge from real-world distributions, potentially leading the model astray from the true target distribution. To address this challenge, we propose a novel auxiliary task called Guidance Training. This task facilitates the effective utilization of cross-domain mixed sampling techniques while mitigating distribution shifts from the real world. Specifically, Guidance Training guides the model to extract and reconstruct the target-domain feature distribution from mixed data, followed by decoding the reconstructed target-domain features to make pseudo-label predictions. Importantly, integrating Guidance Training incurs minimal training overhead and imposes no additional inference burden. We demonstrate the efficacy of our approach by integrating it with existing methods, consistently improving performance. The implementation will be available at https://github.com/Wenlve-Zhou/Guidance-Training.
