Table of Contents
Fetching ...

ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation

Nazanin Moradinasab, Laura S. Shankman, Rebecca A. Deaton, Gary K. Owens, Donald E. Brown

TL;DR

This paper addresses domain-shift challenges in semantic segmentation by moving beyond discriminative-only self-training to a generative approach that models the source feature distribution with a multi-prototype Gaussian Mixture. ProtoGMM uses $p(f_s|c)$ modeled by a GMM with components serving as prototypes, guiding a multi-prototype contrastive loss to improve intra-class similarity and inter-class separation while aligning source and target domains. The framework integrates a Sinkhorn EM-based GMM branch with a discriminative classifier, updates priors and target prototypes via EMA, and computes pseudo-labels and alignments using posterior probabilities and prototype similarities. Empirical results on GTA5→Cityscapes, Synthia→Cityscapes, and a cell-type adaptation dataset show consistent improvements over state-of-the-art methods, validating the approach’s ability to capture within-class variation and mitigate pseudo-label noise and source bias. Overall, ProtoGMM provides a principled, scalable strategy to enhance dense semantic predictions under domain shift by fusing generative and discriminative learning.

Abstract

Domain adaptive semantic segmentation aims to generate accurate and dense predictions for an unlabeled target domain by leveraging a supervised model trained on a labeled source domain. The prevalent self-training approach involves retraining the dense discriminative classifier of $p(class|pixel feature)$ using the pseudo-labels from the target domain. While many methods focus on mitigating the issue of noisy pseudo-labels, they often overlook the underlying data distribution p(pixel feature|class) in both the source and target domains. To address this limitation, we propose the multi-prototype Gaussian-Mixture-based (ProtoGMM) model, which incorporates the GMM into contrastive losses to perform guided contrastive learning. Contrastive losses are commonly executed in the literature using memory banks, which can lead to class biases due to underrepresented classes. Furthermore, memory banks often have fixed capacities, potentially restricting the model's ability to capture diverse representations of the target/source domains. An alternative approach is to use global class prototypes (i.e. averaged features per category). However, the global prototypes are based on the unimodal distribution assumption per class, disregarding within-class variation. To address these challenges, we propose the ProtoGMM model. This novel approach involves estimating the underlying multi-prototype source distribution by utilizing the GMM on the feature space of the source samples. The components of the GMM model act as representative prototypes. To achieve increased intra-class semantic similarity, decreased inter-class similarity, and domain alignment between the source and target domains, we employ multi-prototype contrastive learning between source distribution and target samples. The experiments show the effectiveness of our method on UDA benchmarks.

ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation

TL;DR

This paper addresses domain-shift challenges in semantic segmentation by moving beyond discriminative-only self-training to a generative approach that models the source feature distribution with a multi-prototype Gaussian Mixture. ProtoGMM uses modeled by a GMM with components serving as prototypes, guiding a multi-prototype contrastive loss to improve intra-class similarity and inter-class separation while aligning source and target domains. The framework integrates a Sinkhorn EM-based GMM branch with a discriminative classifier, updates priors and target prototypes via EMA, and computes pseudo-labels and alignments using posterior probabilities and prototype similarities. Empirical results on GTA5→Cityscapes, Synthia→Cityscapes, and a cell-type adaptation dataset show consistent improvements over state-of-the-art methods, validating the approach’s ability to capture within-class variation and mitigate pseudo-label noise and source bias. Overall, ProtoGMM provides a principled, scalable strategy to enhance dense semantic predictions under domain shift by fusing generative and discriminative learning.

Abstract

Domain adaptive semantic segmentation aims to generate accurate and dense predictions for an unlabeled target domain by leveraging a supervised model trained on a labeled source domain. The prevalent self-training approach involves retraining the dense discriminative classifier of using the pseudo-labels from the target domain. While many methods focus on mitigating the issue of noisy pseudo-labels, they often overlook the underlying data distribution p(pixel feature|class) in both the source and target domains. To address this limitation, we propose the multi-prototype Gaussian-Mixture-based (ProtoGMM) model, which incorporates the GMM into contrastive losses to perform guided contrastive learning. Contrastive losses are commonly executed in the literature using memory banks, which can lead to class biases due to underrepresented classes. Furthermore, memory banks often have fixed capacities, potentially restricting the model's ability to capture diverse representations of the target/source domains. An alternative approach is to use global class prototypes (i.e. averaged features per category). However, the global prototypes are based on the unimodal distribution assumption per class, disregarding within-class variation. To address these challenges, we propose the ProtoGMM model. This novel approach involves estimating the underlying multi-prototype source distribution by utilizing the GMM on the feature space of the source samples. The components of the GMM model act as representative prototypes. To achieve increased intra-class semantic similarity, decreased inter-class similarity, and domain alignment between the source and target domains, we employ multi-prototype contrastive learning between source distribution and target samples. The experiments show the effectiveness of our method on UDA benchmarks.
Paper Structure (17 sections, 14 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 14 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Diagram of Proposed Approach
  • Figure 2: Blue-colored nuclei accompanied by: a) the red Lineage tracing marker, b) the purple LGALS3 marker
  • Figure 3: Qualitative analysis on GTA $\rightarrow$ Cityscapes (first row) and Synthia $\rightarrow$ Cityscapes (second row).

Theorems & Definitions (1)

  • proof