S2C: Learning Noise-Resistant Differences for Unsupervised Change Detection in Multimodal Remote Sensing Images
Lei Ding, Xibing Zuo, Danfeng Hong, Haitao Guo, Jun Lu, Zhihui Gong, Lorenzo Bruzzone
TL;DR
The paper introduces S2C, a noise-resistant, unsupervised learning framework for change detection in multimodal remote sensing, by fusing Visual Foundation Models with contrastive learning. It proposes two novel CL paradigms, Consistency-regularized Temporal Contrast (CTC) and Consistency-regularized Spatial Contrast (CSC), augmented with a grid sparsity loss and an IoU-based refinement to robustly map semantic changes across temporal and modality gaps. A key contribution is the triplet-based temporal difference modeling for UCD and the grid-level sparsity regularizer that promotes compact change maps. The framework extends naturally to unsupervised Multimodal Change Detection (MMCD) and demonstrates substantial improvements over state-of-the-art methods on four benchmark datasets, with notable sample efficiency and cross-modality applicability.
Abstract
Unsupervised Change Detection (UCD) in multimodal Remote Sensing (RS) images remains a difficult challenge due to the inherent spatio-temporal complexity within data, and the heterogeneity arising from different imaging sensors. Inspired by recent advancements in Visual Foundation Models (VFMs) and Contrastive Learning (CL) methodologies, this research aims to develop CL methodologies to translate implicit knowledge in VFM into change representations, thus eliminating the need for explicit supervision. To this end, we introduce a Semantic-to-Change (S2C) learning framework for UCD in both homogeneous and multimodal RS images. Differently from existing CL methodologies that typically focus on learning multi-temporal similarities, we introduce a novel triplet learning strategy that explicitly models temporal differences, which are crucial to the CD task. Furthermore, random spatial and spectral perturbations are introduced during the training to enhance robustness to temporal noise. In addition, a grid sparsity regularization is defined to suppress insignificant changes, and an IoU-matching algorithm is developed to refine the CD results. Experiments on four benchmark CD datasets demonstrate that the proposed S2C learning framework achieves significant improvements in accuracy, surpassing current state-of-the-art by over 31\%, 9\%, 23\%, and 15\%, respectively. It also demonstrates robustness and sample efficiency, suitable for training and adaptation of various Visual Foundation Models (VFMs) or backbone neural networks. The relevant code will be available at: github.com/DingLei14/S2C.
