Table of Contents
Fetching ...

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

Zhijie Wang, Masanori Suganuma, Takayuki Okatani

TL;DR

The paper reconsiders unsupervised domain adaptation for semantic segmentation by arguing that labeled target-domain data are often required for validation and hyper-parameter tuning, challenging the standard no-label UDA assumption. It proposes a data-centric approach that either tunes UDA hyper-parameters or finetunes the model using a small labeled target set, and compares these strategies in GTA5 → Cityscapes and SYNTHIA → Cityscapes. The key findings show pronounced hyper-parameter sensitivity across UDA methods and that simple finetuning can outperform many UDA methods once a modest amount of labeled target data is available, highlighting finetuning as a strong baseline under equal labeling budgets. The practical implication is a call for future UDA research to benchmark against finetuning with equivalent labeled-data budgets and to consider data-centric validation and splits to reflect realistic deployment settings.

Abstract

Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}$\rightarrow$Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

TL;DR

The paper reconsiders unsupervised domain adaptation for semantic segmentation by arguing that labeled target-domain data are often required for validation and hyper-parameter tuning, challenging the standard no-label UDA assumption. It proposes a data-centric approach that either tunes UDA hyper-parameters or finetunes the model using a small labeled target set, and compares these strategies in GTA5 → Cityscapes and SYNTHIA → Cityscapes. The key findings show pronounced hyper-parameter sensitivity across UDA methods and that simple finetuning can outperform many UDA methods once a modest amount of labeled target data is available, highlighting finetuning as a strong baseline under equal labeling budgets. The practical implication is a call for future UDA research to benchmark against finetuning with equivalent labeled-data budgets and to consider data-centric validation and splits to reflect realistic deployment settings.

Abstract

Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.
Paper Structure (21 sections, 1 equation, 5 figures, 4 tables)

This paper contains 21 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Performance (mean IoU) of UDA methods and finetuning vs. the number of labeled images (i.e., $|\mathcal{S}_\mathrm{val}|$). Upper: GTA5 $\rightarrow$ Cityscapes. Lower: SYNTHIA $\rightarrow$ Cityscapes.
  • Figure 2: Hyper-parameter sensitivity of IntraDA (upper) and FADA (lower).
  • Figure 3: Hyper-parameter sensitivity of AdaptSegNet (upper) and IAST (lower).
  • Figure 4: Hyper-parameter sensitivity of AdvEnt (upper) and CBST (lower).
  • Figure 5: Performance (mean IoU) of UDA methods and finetuning vs. the number of labeled images (i.e., ($|\mathcal{S}^\mathrm{FT}_\mathrm{train}|$, $|\mathcal{S}^\mathrm{FT}_\mathrm{val}|$)) on GTA5 $\rightarrow$ Cityscapes.