Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining
Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang
TL;DR
The paper tackles remote sensing change detection by arguing that motion cues between bi-temporal images are underutilized in traditional pipelines. It introduces the Coarse-grained Temporal Mining Augmented (CTMA) framework, which first converts image pairs into a dense pseudo-video and learns temporal features with a Temporal Encoder to yield a coarse change map, then augments a Coarse-grained Foregrounds Augmented Spatial Encoder (CFA-SE) that fuses global/local information and incorporates motion-augmented and mask-augmented strategies for refinement. A weighted BCE loss supervises both temporal and spatial branches, and experiments on SVCD, LEVIR-CD, and WHU-CD demonstrate state-of-the-art performance with strong ablations validating each component. The work advances RSCD by integrating motion cues and coarse-to-fine fusion, offering improved accuracy and robustness with publicly available code for reproducibility and further research.
Abstract
Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given the bi-temporal images, we first transform them into a video using interpolation operations. Then, a set of temporal encoders is adopted to extract the motion features from the obtained video for coarse-grained changed region prediction. Subsequently, we design a novel Coarse-grained Foregrounds Augmented Spatial Encoder module to integrate both global and local information. We also introduce a motion augmented strategy that leverages motion cues as an additional output to aggregate with the spatial features for improved results. Meanwhile, we feed the input image pairs into the ResNet to get the different features and also the spatial blocks for fine-grained feature learning. More importantly, we propose a mask augmented strategy that utilizes coarse-grained changed regions, incorporating them into the decoder blocks to enhance the final changed prediction. Extensive experiments conducted on multiple benchmark datasets fully validated the effectiveness of our proposed framework for remote sensing image change detection. The source code of this paper will be released on https://github.com/Event-AHU/CTM_Remote_Sensing_Change_Detection
