Table of Contents
Fetching ...

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

TL;DR

This work proposes T2S-DA, which encourages the model to learn similar cross-domain features and shows that T2S-DA is quite qualified for the domain generalization task, verifying its domain-invariant property.

Abstract

Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning qualified target features, making it challenging to guarantee their discrimination in the absence of target labels. This work provides a new perspective. We observe that the features learned with source data manage to keep categorically discriminative during training, thereby enabling us to implicitly learn adequate target representations by simply \textbf{pulling target features close to source features for each category}. To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features. Also, considering the pixel categories are heavily imbalanced for segmentation datasets, we come up with a dynamic re-weighting strategy to help the model concentrate on those underperforming classes. Extensive experiments confirm that T2S-DA learns a more discriminative and generalizable representation, significantly surpassing the state-of-the-art. We further show that our method is quite qualified for the domain generalization task, verifying its domain-invariant property.

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

TL;DR

This work proposes T2S-DA, which encourages the model to learn similar cross-domain features and shows that T2S-DA is quite qualified for the domain generalization task, verifying its domain-invariant property.

Abstract

Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning qualified target features, making it challenging to guarantee their discrimination in the absence of target labels. This work provides a new perspective. We observe that the features learned with source data manage to keep categorically discriminative during training, thereby enabling us to implicitly learn adequate target representations by simply \textbf{pulling target features close to source features for each category}. To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features. Also, considering the pixel categories are heavily imbalanced for segmentation datasets, we come up with a dynamic re-weighting strategy to help the model concentrate on those underperforming classes. Extensive experiments confirm that T2S-DA learns a more discriminative and generalizable representation, significantly surpassing the state-of-the-art. We further show that our method is quite qualified for the domain generalization task, verifying its domain-invariant property.
Paper Structure (43 sections, 20 equations, 11 figures, 17 tables)

This paper contains 43 sections, 20 equations, 11 figures, 17 tables.

Figures (11)

  • Figure 1: Category-wise cross-domain feature similarity as well as the evaluation results on the target domain. When directly testing the model trained with source data (i.e., "source only") on the target data, the categories, where source and target features are largely dissimilar to each other, suffer from low IoU.
  • Figure 2: Concept comparison between (a) conventional methods and (b) our T2S-DA. To obtain discriminative features from target images, existing approaches directly supervise the model with target pseudo-labels regardless of the similarity between cross-domain features. Differently, T2S-DA addresses this issue from a new perspective, where we argue that "discriminative source features" plus "making target features close to source features" implicitly brings capable target features.
  • Figure 3: Illustration of contrastive pairs. We regard features from pseudo-targets and source prototypes of the same category as the positive pairs defined in Eq. (\ref{['eq:pos']}). Negative keys include 1) source features from a different category (as in Eq. (\ref{['eq:neg_s']})), and 2) unreliable target features from a different category (as in Eq. (\ref{['eq:neg_t']})). In this way, this model is encouraged to learn similar features between the source and target domains from any category. The improved similarity indeed boosts segmentation results (see Fig. \ref{['fig:stats']}).
  • Figure 4: The pipeline of FDAyang2020fda. Given a source image $\mathbf{x}^s$ and a randomly sampled target image $\mathbf{x}^t$, FDA transfers the source image into target style, resulting in $\mathcal{T}(\mathbf{x}^s)$, by pasting the low-frequency part of the amplitude from the target sample to the source image.
  • Figure 5: Illustration of our sampling strategies introduced in Sec \ref{['sec:sample']}. (a) Class-balanced query sampling means we first estimate the current label distribution and then sample queries that follow the estimated distribution. (b) Domain-equalized negative keys sampling indicates we equally sample $m/2$ negative keys for each query from source and target, respectively.
  • ...and 6 more figures