Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
Jeongkee Lim, Yusung Kim
TL;DR
This work tackles unsupervised domain adaptation for semantic segmentation under inconsistent taxonomies, including open and coarse-to-fine scenarios. It introduces CSI, a framework that leverages Vision Language Models (OWL-ViT and CLIP) to perform zero-shot relabeling of target-domain classes not present in the source, by building a From-To map, extracting and filtering patches, and pasting relabeled regions into pseudo labels. The approach integrates with existing UDA methods (e.g., MIC, DAFormer) and demonstrates substantial improvements in mIoU on benchmarks like Synthia→Cityscapes, including better handling of target-only and newly split classes. CSI demonstrates broad compatibility with different domain configurations and highlights the practical impact of combining strong segmentation reasoning with open-vocabulary semantic knowledge for robust cross-domain adaptation. The authors provide code and discuss limitations and future directions in supplemental material.
Abstract
The challenge of semantic segmentation in Unsupervised Domain Adaptation (UDA) emerges not only from domain shifts between source and target images but also from discrepancies in class taxonomies across domains. Traditional UDA research assumes consistent taxonomy between the source and target domains, thereby limiting their ability to recognize and adapt to the taxonomy of the target domain. This paper introduces a novel approach, Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Vision Language Models (CSI), which effectively performs domain-adaptive semantic segmentation even in situations of source-target class mismatches. CSI leverages the semantic generalization potential of Visual Language Models (VLMs) to create synergy with previous UDA methods. It leverages segment reasoning obtained through traditional UDA methods, combined with the rich semantic knowledge embedded in VLMs, to relabel new classes in the target domain. This approach allows for effective adaptation to extended taxonomies without requiring any ground truth label for the target domain. Our method has shown to be effective across various benchmarks in situations of inconsistent taxonomy settings (coarse-to-fine taxonomy and open taxonomy) and demonstrates consistent synergy effects when integrated with previous state-of-the-art UDA methods. The implementation is available at http://github.com/jkee58/CSI.
