CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection
Haotian Zhang, Keyan Chen, Chenyang Liu, Hao Chen, Zhengxia Zou, Zhenwei Shi
TL;DR
CDMamba addresses binary change detection in remote sensing by fusing global context from Mamba with local cues through the Scaled Residual ConvMamba (SRCM) block and enabling dynamic bi-temporal interaction via the Adaptive Global Local Guided Fusion (AGLGF) block. The model achieves state-of-the-art or competitive performance on WHU-CD, LEVIR-CD, and LEVIR+-CD, with ablations confirming the importance of SRCM and AGLGF for dense prediction. The work demonstrates that combining global and local information yields more discriminative change features, and provides an open-source implementation to advance practical CD applications. This approach offers a scalable, efficient alternative to Transformer-heavy methods for dense remote sensing change detection tasks, with potential extensions to self-supervised learning.
Abstract
Recently, the Mamba architecture based on state space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense prediction tasks (e.g., binary CD). In this article, we propose a model called CDMamba, which effectively combines global and local features for handling binary CD tasks. Specifically, the Scaled Residual ConvMamba (SRCM) block is proposed to utilize the ability of Mamba to extract global features and convolution to enhance the local details to alleviate the issue that current Mamba-based methods lack detailed clues and are difficult to achieve fine detection in dense prediction tasks. Furthermore, considering the characteristics of bi-temporal feature interaction required for CD, the Adaptive Global Local Guided Fusion (AGLGF) block is proposed to dynamically facilitate the bi-temporal interaction guided by other temporal global/local features. Our intuition is that more discriminative change features can be acquired with the guidance of other temporal features. Extensive experiments on five datasets demonstrate that our proposed CDMamba is comparable to the current methods (such as the F1/IoU scores are improved by 2.10%/3.00% and 2.44%/2.91% on LEVIR+CD and CLCD, respectively). Our code is open-sourced at https://github.com/zmoka-zht/CDMamba.
