DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

Han Sun; Rui Gong; Ismail Nejjar; Olga Fink

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

Han Sun, Rui Gong, Ismail Nejjar, Olga Fink

TL;DR

DynAlign tackles cross-domain semantic segmentation when source and target taxonomies differ, by combining domain-specific UDA with foundation-model priors. It employs a three-stage pipeline: domain knowledge to align image-level shifts, semantic taxonomy mapping via language models to bridge label-level gaps, and visual priors (SAM+CLIP) with a fusion mechanism to reassign labels and generate pseudo-labels. The approach achieves state-of-the-art results on GTA→Mapillary and GTA→IDD, including improved handling of unseen classes, and supports unsupervised adaptation to evolving taxonomies. The work highlights the practical potential of integrating domain knowledge with open-world priors for robust, annotation-free cross-domain semantic segmentation.

Abstract

Current unsupervised domain adaptation (UDA) methods for semantic segmentation typically assume identical class labels between the source and target domains. This assumption ignores the label-level domain gap, which is common in real-world scenarios, thus limiting their ability to identify finer-grained or novel categories without requiring extensive manual annotation. A promising direction to address this limitation lies in recent advancements in foundation models, which exhibit strong generalization abilities due to their rich prior knowledge. However, these models often struggle with domain-specific nuances and underrepresented fine-grained categories. To address these challenges, we introduce DynAlign, a framework that integrates UDA with foundation models to bridge both the image-level and label-level domain gaps. Our approach leverages prior semantic knowledge to align source categories with target categories that can be novel, more fine-grained, or named differently (e.g., vehicle to {car, truck, bus}). Foundation models are then employed for precise segmentation and category reassignment. To further enhance accuracy, we propose a knowledge fusion approach that dynamically adapts to varying scene contexts. DynAlign generates accurate predictions in a new target label space without requiring any manual annotations, allowing seamless adaptation to new taxonomies through either model retraining or direct inference. Experiments on the street scene semantic segmentation benchmarks GTA to Mapillary Vistas and GTA to IDD validate the effectiveness of our approach, achieving a significant improvement over existing methods. Our code will be publicly available.

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

TL;DR

Abstract

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)