SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing
Bin Wang, Fei Deng, Shuang Wang, Wen Luo, Zhixuan Zhang, Peifan Jiang
TL;DR
SiamSeg addresses cross-domain semantic segmentation in remote sensing by integrating contrastive learning with self-training, enabling stronger target-domain feature learning under unlabeled data. The method combines a standard segmentation network with an EMA teacher for pseudo-labeling and a Siamese contrastive branch that generates two augmented target views to maximize representation quality through a negative cosine similarity objective. Empirical results on Potsdam, Vaihingen, and LoveDA demonstrate state-of-the-art performance across multiple cross-domain tasks, with ablations confirming the value of contrastive supervision and informative augmentations. The approach offers practical benefits for RS applications with limited labeled data, and code is publicly available for reproducibility; future work aims to reduce reliance on source data via SFDA and enhance generalization across tasks.
Abstract
Semantic segmentation of remote sensing (RS) images is a challenging yet essential task with broad applications. While deep learning, particularly supervised learning with large-scale labeled datasets, has significantly advanced this field, the acquisition of high-quality labeled data remains costly and time-intensive. Unsupervised domain adaptation (UDA) provides a promising alternative by enabling models to learn from unlabeled target domain data while leveraging labeled source domain data. Recent self-training (ST) approaches employing pseudo-label generation have shown potential in mitigating domain discrepancies. However, the application of ST to RS image segmentation remains underexplored. Factors such as variations in ground sampling distance, imaging equipment, and geographic diversity exacerbate domain shifts, limiting model performance across domains. In that case, existing ST methods, due to significant domain shifts in cross-domain RS images, often underperform. To address these challenges, we propose integrating contrastive learning into UDA, enhancing the model's ability to capture semantic information in the target domain by maximizing the similarity between augmented views of the same image. This additional supervision improves the model's representational capacity and segmentation performance in the target domain. Extensive experiments conducted on RS datasets, including Potsdam, Vaihingen, and LoveDA, demonstrate that our method, SimSeg, outperforms existing approaches, achieving state-of-the-art results. Visualization and quantitative analyses further validate SimSeg's superior ability to learn from the target domain. The code is publicly available at https://github.com/woldier/SiamSeg.
