Table of Contents
Fetching ...

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

Woo-Jin Ahn, Geun-Yeong Yang, Hyun-Duck Choi, Myo-Taeg Lim

TL;DR

The main idea behind the proposed approach is to alleviate the effect of style in the encoder whilst facilitating robust segmentation in the decoder, exhibiting robustness and superior performance for semantic segmentation on unseen target domains.

Abstract

Deep learning models for semantic segmentation often experience performance degradation when deployed to unseen target domains unidentified during the training phase. This is mainly due to variations in image texture (\ie style) from different data sources. To tackle this challenge, existing domain generalized semantic segmentation (DGSS) methods attempt to remove style variations from the feature. However, these approaches struggle with the entanglement of style and content, which may lead to the unintentional removal of crucial content information, causing performance degradation. This study addresses this limitation by proposing BlindNet, a novel DGSS approach that blinds the style without external modules or datasets. The main idea behind our proposed approach is to alleviate the effect of style in the encoder whilst facilitating robust segmentation in the decoder. To achieve this, BlindNet comprises two key components: covariance alignment and semantic consistency contrastive learning. Specifically, the covariance alignment trains the encoder to uniformly recognize various styles and preserve the content information of the feature, rather than removing the style-sensitive factor. Meanwhile, semantic consistency contrastive learning enables the decoder to construct discriminative class embedding space and disentangles features that are vulnerable to misclassification. Through extensive experiments, our approach outperforms existing DGSS methods, exhibiting robustness and superior performance for semantic segmentation on unseen target domains.

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

TL;DR

The main idea behind the proposed approach is to alleviate the effect of style in the encoder whilst facilitating robust segmentation in the decoder, exhibiting robustness and superior performance for semantic segmentation on unseen target domains.

Abstract

Deep learning models for semantic segmentation often experience performance degradation when deployed to unseen target domains unidentified during the training phase. This is mainly due to variations in image texture (\ie style) from different data sources. To tackle this challenge, existing domain generalized semantic segmentation (DGSS) methods attempt to remove style variations from the feature. However, these approaches struggle with the entanglement of style and content, which may lead to the unintentional removal of crucial content information, causing performance degradation. This study addresses this limitation by proposing BlindNet, a novel DGSS approach that blinds the style without external modules or datasets. The main idea behind our proposed approach is to alleviate the effect of style in the encoder whilst facilitating robust segmentation in the decoder. To achieve this, BlindNet comprises two key components: covariance alignment and semantic consistency contrastive learning. Specifically, the covariance alignment trains the encoder to uniformly recognize various styles and preserve the content information of the feature, rather than removing the style-sensitive factor. Meanwhile, semantic consistency contrastive learning enables the decoder to construct discriminative class embedding space and disentangles features that are vulnerable to misclassification. Through extensive experiments, our approach outperforms existing DGSS methods, exhibiting robustness and superior performance for semantic segmentation on unseen target domains.
Paper Structure (18 sections, 9 equations, 11 figures, 7 tables)

This paper contains 18 sections, 9 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Comparison of semantic segmentation results between the baseline (DeepLabV3+ with ResNet50 backbone) and our BlindNet. Both models are trained on the source domain (GTAV richter2016playing) and tested on the target domain (Cityscapes cordts2016cityscapes).
  • Figure 2: Overview of the proposed BlindNet. The network processes a pair of images - the original image $x$ and its augmented counterpart $x_a$. It employs covariance alignment to treat encoder features and utilizes semantic consistency contrastive learning for the processing of decoder features.
  • Figure 3: Illustration of semantic consistency contrastive learning: The mask $M$ represents the error mask derived from the augmented segmentation map. CWCL conducts contrastive learning by sampling per segmentation class and SDCL conducts contrastive learning based on the $M$. Both methods share a projection head $\pi$ for the semantic representation.
  • Figure 4: t-SNE van2008visualizing visualization comparing scenarios with and without $\mathcal{L}_{SCCL}$. In (b), the application of SCCL results in a clear separation between the sidewalk (pink), the road (purple), and the building (gray).
  • Figure 5: Qualitative comparison between DGSS methods trained on GTAV (G) and tested on unseen target domains of Cityscapes (C) using DeeplabV3+ with ResNet50 backbone.
  • ...and 6 more figures