TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
Enming Zhang, Zhengyu Li, Yanru Wu, Jingge Wang, Yang Tan, Guan Wang, Yang Li, Xiaoping Zhang
TL;DR
This paper tackles cross-domain semantic segmentation with Vision Transformers by addressing region-wise transferability. It introduces ACTE, an Adaptive Cluster-based Transferability Estimator, to segment images into coherent regions and estimate region transferability, and Transferable Masked Attention (TMA) to gate the self-attention mechanism with these region-transferability cues. The提出 ACTE and TMA together yield objective improvements across 20 source-target pairs on five benchmarks, demonstrating robust handling of domain shifts and better region-level segmentation boundaries. The results suggest that region-adaptive transferability guidance in Transformer-based segmentation offers substantial practical benefits for real-world cross-domain applications.
Abstract
Recent advances in Vision Transformers (ViTs) have significantly advanced semantic segmentation performance. However, their adaptation to new target domains remains challenged by distribution shifts, which often disrupt global attention mechanisms. While existing global and patch-level adaptation methods offer some improvements, they overlook the spatially varying transferability inherent in different image regions. To address this, we propose the Transferable Mask Transformer (TMT), a region-adaptive framework designed to enhance cross-domain representation learning through transferability guidance. First, we dynamically partition the image into coherent regions, grouped by structural and semantic similarity, and estimates their domain transferability at a localized level. Then, we incorporate region-level transferability maps directly into the self-attention mechanism of ViTs, allowing the model to adaptively focus attention on areas with lower transferability and higher semantic uncertainty. Extensive experiments across 20 diverse cross-domain settings demonstrate that TMT not only mitigates the performance degradation typically associated with domain shift but also consistently outperforms existing approaches.
