Table of Contents
Fetching ...

Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations

Serban Stan, Mohammad Rostami

TL;DR

This work tackles long-term domain shifts in semantic segmentation under a source-free constraint by learning a shared embedding via an internal surrogate distribution. After source-domain pretraining, it builds a $K$-class, $T$-component Gaussian Mixture Model to approximate the source latent distribution and uses Sliced Wasserstein Distance to align target embeddings to this surrogate, while fine-tuning the classifier on GMM samples. Theoretical bounds extend joint-UDAs guarantees to include the intermediate distribution, and experiments on SYNTHIA/$Cityscapes$ and GTA5/$Cityscapes$ show MAS$^3$ achieves competitive performance against state-of-the-art methods, particularly when source data access is restricted. The approach offers a practical, privacy-conscious pathway for continual domain adaptation in real-world segmentation systems, with robust sensitivity properties to key hyperparameters and clear avenues for future work on partial-domain settings.

Abstract

Semantic segmentation models trained on annotated data fail to generalize well when the input data distribution changes over extended time period, leading to requiring re-training to maintain performance. Classic Unsupervised domain adaptation (UDA) attempts to address a similar problem when there is target domain with no annotated data points through transferring knowledge from a source domain with annotated data. We develop an online UDA algorithm for semantic segmentation of images that improves model generalization on unannotated domains in scenarios where source data access is restricted during adaptation. We perform model adaptation is by minimizing the distributional distance between the source latent features and the target features in a shared embedding space. Our solution promotes a shared domain-agnostic latent feature space between the two domains, which allows for classifier generalization on the target dataset. To alleviate the need of access to source samples during adaptation, we approximate the source latent feature distribution via an appropriate surrogate distribution, in this case a Gassian mixture model (GMM). We evaluate our approach on well established semantic segmentation datasets and demonstrate it compares favorably against state-of-the-art (SOTA) UDA semantic segmentation methods.

Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations

TL;DR

This work tackles long-term domain shifts in semantic segmentation under a source-free constraint by learning a shared embedding via an internal surrogate distribution. After source-domain pretraining, it builds a -class, -component Gaussian Mixture Model to approximate the source latent distribution and uses Sliced Wasserstein Distance to align target embeddings to this surrogate, while fine-tuning the classifier on GMM samples. Theoretical bounds extend joint-UDAs guarantees to include the intermediate distribution, and experiments on SYNTHIA/ and GTA5/ show MAS achieves competitive performance against state-of-the-art methods, particularly when source data access is restricted. The approach offers a practical, privacy-conscious pathway for continual domain adaptation in real-world segmentation systems, with robust sensitivity properties to key hyperparameters and clear avenues for future work on partial-domain settings.

Abstract

Semantic segmentation models trained on annotated data fail to generalize well when the input data distribution changes over extended time period, leading to requiring re-training to maintain performance. Classic Unsupervised domain adaptation (UDA) attempts to address a similar problem when there is target domain with no annotated data points through transferring knowledge from a source domain with annotated data. We develop an online UDA algorithm for semantic segmentation of images that improves model generalization on unannotated domains in scenarios where source data access is restricted during adaptation. We perform model adaptation is by minimizing the distributional distance between the source latent features and the target features in a shared embedding space. Our solution promotes a shared domain-agnostic latent feature space between the two domains, which allows for classifier generalization on the target dataset. To alleviate the need of access to source samples during adaptation, we approximate the source latent feature distribution via an appropriate surrogate distribution, in this case a Gassian mixture model (GMM). We evaluate our approach on well established semantic segmentation datasets and demonstrate it compares favorably against state-of-the-art (SOTA) UDA semantic segmentation methods.
Paper Structure (20 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

(redko2017theoretical) For the variables defined under Theorem theorem1, the following distribution alignment inequality loss holds:

Figures (6)

  • Figure 1: Diagram of the proposed model adaptation approach (best seen in color): (a) initial model training using the source domain labeled data, (b) estimating the internal distribution as a GMM distribution in the embedding space, (c) domain alignment is enforced by minimizing the distance between the internal distribution samples and the target unlabeled samples, (d) domain adaptation is enforced for the classifier module to fit correspondingly to the GMM distribution.
  • Figure 2: Qualitative performance: examples of the segmented frames for SYNTHIA$\rightarrow$Cityscapes using the MAS$^3$ method. Left to right: real images, manually annotated images, source-trained model predictions, predictions based on our method.
  • Figure 3: Qualitative performance: examples of the segmented frames for SYNTHIA$\rightarrow$Cityscapes and GTA5$\rightarrow$Cityscapes using the MAS$^3$ method. From left to right column: real images, manually annotated images, source-trained model predictions, predictions based on our method.
  • Figure 4: Indirect distribution matching in the embedding space: (a) drawn samples from the GMM trained on the SYNTHIA distribution, (b) representations of the Cityscapes validation samples prior to model adaptation (c) representation of the Cityscapes validation samples after domain alignment.
  • Figure 5: Ablation experiment to study effect of $\tau$ on the GMM learnt in the embedding space: (a) all samples are used; adaptation mIoU=41.8, (b) a portion of samples is used; adaptation mIoU=42.7, (c) samples with high model-confidence are used; adaptation mIoU=44.7
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2