Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

Seogkyu Jeon; Kibeom Hong; Hyeran Byun

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

Seogkyu Jeon, Kibeom Hong, Hyeran Byun

TL;DR

DGSS suffers from domain shifts that degrade cross-domain segmentation performance. DPMFormer addresses this with domain-aware prompt learning that injects input-domain properties into textual prompts and domain-robust consistency learning that enforces stable predictions under texture-based domain perturbations, all built on a Mask2Former/VLM backbone. Key contributions include a domain-aware contrastive loss to align text and image domain cues, texture perturbations to diversify observable domains, and multi-layer consistency losses to prevent error propagation. The results demonstrate state-of-the-art performance on synthetic-to-real and real-to-real DGSS benchmarks, with meaningful improvements across multiple domains and robust qualitative behavior under diverse styles.

Abstract

Recent domain generalized semantic segmentation (DGSS) studies have achieved notable improvements by distilling semantic knowledge from Vision-Language Models (VLMs). However, they overlook the semantic misalignment between visual and textual contexts, which arises due to the rigidity of a fixed context prompt learned on a single source domain. To this end, we present a novel domain generalization framework for semantic segmentation, namely Domain-aware Prompt-driven Masked Transformer (DPMFormer). Firstly, we introduce domain-aware prompt learning to facilitate semantic alignment between visual and textual cues. To capture various domain-specific properties with a single source dataset, we propose domain-aware contrastive learning along with the texture perturbation that diversifies the observable domains. Lastly, to establish a framework resilient against diverse environmental changes, we have proposed the domain-robust consistency learning which guides the model to minimize discrepancies of prediction from original and the augmented images. Through experiments and analyses, we demonstrate the superiority of the proposed framework, which establishes a new state-of-the-art on various DGSS benchmarks. The code is available at https://github.com/jone1222/DPMFormer.

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

TL;DR

Abstract

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)