Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

Chuang Zhu; Kebin Liu; Wenqi Tang; Ke Mei; Jiaqi Zou; Tiejun Huang

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

Chuang Zhu, Kebin Liu, Wenqi Tang, Ke Mei, Jiaqi Zou, Tiejun Huang

TL;DR

This work tackles unsupervised domain adaptation for semantic segmentation under domain shift by focusing on high-quality, diverse pseudo-labels for hard classes. It introduces HIAST, a framework consisting of an Instance Adaptive Selector (IAS) for adaptive per-class thresholds, Hard-aware Pseudo-label Augmentation (HPLA) to enrich hard-class pseudo-labels via inter-image copying, and region-adaptive regularization with a Mean-Teacher-style consistency constraint to stabilize training. Ablations and experiments on GTA5→Cityscapes, SYNTHIA→Cityscapes, and Cityscapes→Oxford RobotCar show that HIAST delivers state-of-the-art performance, particularly for hard classes and small objects, and can be plugged into other UDA methods. The approach also extends to semi-supervised segmentation and emphasizes parameter selection without ground-truth labels, making it practical and broadly applicable.

Abstract

The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing the scalability and performance. In this paper, we propose a hard-aware instance adaptive self-training framework for UDA on the task of semantic segmentation. To effectively improve the quality and diversity of pseudo-labels, we develop a novel pseudo-label generation strategy with an instance adaptive selector. We further enrich the hard class pseudo-labels with inter-image information through a skillfully designed hard-aware pseudo-label augmentation. Besides, we propose the region-adaptive regularization to smooth the pseudo-label region and sharpen the non-pseudo-label region. For the non-pseudo-label region, consistency constraint is also constructed to introduce stronger supervision signals during model optimization. Our method is so concise and efficient that it is easy to be generalized to other UDA methods. Experiments on GTA5 to Cityscapes, SYNTHIA to Cityscapes, and Cityscapes to Oxford RobotCar demonstrate the superior performance of our approach compared with the state-of-the-art methods. Our codes are available at https://github.com/bupt-ai-cz/HIAST.

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

TL;DR

Abstract

Paper Structure (28 sections, 15 equations, 8 figures, 15 tables, 3 algorithms)

This paper contains 28 sections, 15 equations, 8 figures, 15 tables, 3 algorithms.

Introduction
Related Works
UDA for Semantic Segmentation
Copy-and-Paste in UDA
Model Optimization for UDA
Preliminary
UDA for Semantic Segmentation
Self-training for UDA
Adversarial Training for UDA
Proposed Method
Overview of Our Method
Pseudo-label Generation Strategy with an Instance Adaptive Selector
Exponential moving average (EMA) threshold
Hard classes weight decay (HWD)
Hard-aware Pseudo-label Augmentation
...and 13 more sections

Figures (8)

Figure 1: Illustration for the results of pseudo-labels. (a): Ground truth. (b): CBST is biased to such predominant classes as road and vegetation, other classes are almost ignored. (c): IAST has improved the diversity of categories and produced more valid regions, especially for these hard classes such as rider and bike. (d): HIAST further improves the proportion of hard classes by augmentation which transfers pixels (regions surrounded by red dashed line) from other target domain images. For a fair comparison, all pseudo-labels are generated by the same model, with the proportion of about 20% for pseudo-labels in the target domain dataset.
Figure 3: The core flows of HIAST. (a): Before self-training, pseudo-label of the target domain is produced by IAS with $\mathbf{G}$ which is initialized by the warm-up. IAS has combined the global and local information during pseudo-label generation, thus providing adaptive selection thresholds for different classes. (b): During self-training, to enrich the proportions of hard classes, the target domain image and corresponding pseudo-label are first processed by HPLA. Then, the target image is augmented by strong and weak augmentation; following MeanTeacher_Tarvainen2017MeanTA, the strong perturbed version is fed into $\mathbf{M}$ for computing segmentation loss, and the weak perturbed version is fed into $\overline{\mathbf{M}}$ for consistency training. Furthermore, the regularization is imposed to avoid model overconfident to the pseudo-labels and sharpen the prediction on ignored regions.
Figure 4: Illustration of three different threshold methods. $\mathbf{x}_{t-1}$ and $\mathbf{x}_{t}$ represent two consecutive instances, and the bars approximately represent the probabilities of each class. (a): A constant threshold is used for all instances. (b): Class-balanced thresholds are used for all instances. (c): Our method adaptively adjusts the threshold of each class based on the instance.
Figure 5: Proposed hard-aware pseudo-label augmentation. (a): The randomly selected target domain images for copying pixels of hard classes. (b): The original target image in the training batch. (c): The result of HPLA, which has copied the data of traffic light, rider, and motorcycle from (a), and thus the diversity of hard classes in pseudo-labels can be further enriched.
Figure 6: The specific voting process is as follows. For a given target image $I_k$, the models $[M_{1}, ..., M_{n}]$ trained with different parameter values will generate the pseudo-labels $[\hat{y}_k^{1},...,\hat{y}_k^{n}]$ for $I_k$. Then, at each pixel position, the majority voting method is adopted to obtain the corresponding class, thereby obtaining the fused pseudo-label $\hat{y}_k^v$. The red numbers indicate the values that are different from the $\hat{y}_k^v$ for convenience.
...and 3 more figures

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

TL;DR

Abstract

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)