Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

Zhijie Wang; Masanori Suganuma; Takayuki Okatani

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

Zhijie Wang, Masanori Suganuma, Takayuki Okatani

TL;DR

The paper reconsiders unsupervised domain adaptation for semantic segmentation by arguing that labeled target-domain data are often required for validation and hyper-parameter tuning, challenging the standard no-label UDA assumption. It proposes a data-centric approach that either tunes UDA hyper-parameters or finetunes the model using a small labeled target set, and compares these strategies in GTA5 → Cityscapes and SYNTHIA → Cityscapes. The key findings show pronounced hyper-parameter sensitivity across UDA methods and that simple finetuning can outperform many UDA methods once a modest amount of labeled target data is available, highlighting finetuning as a strong baseline under equal labeling budgets. The practical implication is a call for future UDA research to benchmark against finetuning with equivalent labeled-data budgets and to consider data-centric validation and splits to reflect realistic deployment settings.

Abstract

Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}$\rightarrow$Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

TL;DR

Abstract

Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.

Paper Structure (21 sections, 1 equation, 5 figures, 4 tables)

This paper contains 21 sections, 1 equation, 5 figures, 4 tables.

Introduction
Related Work
Unsupervised Domain Adaptation
Selecting Hyper-parameter for UDA Methods
Finetuning for Few-shot Learning
Semi-supervised Domain Adaptation
Rethinking UDA from Data-centric Perspective
Realism of the Assumption of UDA
Choosing Hyper-parameters of UDA Methods
An Alternative Approach: Finetuning
Issues with the Experimental Design of UDA
Experiments
Experimental Settings
Results
Sensitivity of UDA Methods to Hyper-parameters
...and 6 more sections

Figures (5)

Figure 1: Performance (mean IoU) of UDA methods and finetuning vs. the number of labeled images (i.e., $|\mathcal{S}_\mathrm{val}|$). Upper: GTA5 $\rightarrow$ Cityscapes. Lower: SYNTHIA $\rightarrow$ Cityscapes.
Figure 2: Hyper-parameter sensitivity of IntraDA (upper) and FADA (lower).
Figure 3: Hyper-parameter sensitivity of AdaptSegNet (upper) and IAST (lower).
Figure 4: Hyper-parameter sensitivity of AdvEnt (upper) and CBST (lower).
Figure 5: Performance (mean IoU) of UDA methods and finetuning vs. the number of labeled images (i.e., ($|\mathcal{S}^\mathrm{FT}_\mathrm{train}|$, $|\mathcal{S}^\mathrm{FT}_\mathrm{val}|$)) on GTA5 $\rightarrow$ Cityscapes.

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

TL;DR

Abstract

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)