SChanger: Change Detection from a Semantic Change and Spatial Consistency Perspective
Ziyu Zhou, Keyan Hu, Yutian Fang, Xiaoping Rui
TL;DR
SChanger tackles data scarcity in RSCD by pretraining a Semantic Prior Network on single-temporal segmentation and fine-tuning with a Semantic Change Network that leverages a Siamese dual-temporal framework. Core innovations include the Spatial Consistency Attention Module, Temporal Fusion Module, Lightweight Feature Enhancement, and Multi-Scale Fusion Segmentation Head, all guided by SCN’s SAF and SFA mechanisms. The approach achieves state-of-the-art F1 scores across six benchmarks, while dramatically reducing parameter count and FLOPs relative to prior methods, and demonstrates robust few-shot and cross-domain transfer capabilities. These results suggest that leveraging single-temporal priors with semantically aligned dual-temporal fusion can significantly improve RSCD performance with high efficiency and broad applicability.
Abstract
Change detection is a key task in Earth observation applications. Recently, deep learning methods have demonstrated strong performance and widespread application. However, change detection faces data scarcity due to the labor-intensive process of accurately aligning remote sensing images of the same area, which limits the performance of deep learning algorithms. To address the data scarcity issue, we develop a fine-tuning strategy called the Semantic Change Network (SCN). We initially pre-train the model on single-temporal supervised tasks to acquire prior knowledge of instance feature extraction. The model then employs a shared-weight Siamese architecture and extended Temporal Fusion Module (TFM) to preserve this prior knowledge and is fine-tuned on change detection tasks. The learned semantics for identifying all instances is changed to focus on identifying only the changes. Meanwhile, we observe that the locations of changes between the two images are spatially identical, a concept we refer to as spatial consistency. We introduce this inductive bias through an attention map that is generated by large-kernel convolutions and applied to the features from both time points. This enhances the modeling of multi-scale changes and helps capture underlying relationships in change detection semantics. We develop a binary change detection model utilizing these two strategies. The model is validated against state-of-the-art methods on six datasets, surpassing all benchmark methods and achieving F1 scores of 92.87%, 86.43%, 68.95%, 97.62%, 84.58%, and 93.20% on the LEVIR-CD, LEVIR-CD+, S2Looking, CDD, SYSU-CD, and WHU-CD datasets, respectively.
