CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection

Haotian Zhang; Keyan Chen; Chenyang Liu; Hao Chen; Zhengxia Zou; Zhenwei Shi

CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection

Haotian Zhang, Keyan Chen, Chenyang Liu, Hao Chen, Zhengxia Zou, Zhenwei Shi

TL;DR

CDMamba addresses binary change detection in remote sensing by fusing global context from Mamba with local cues through the Scaled Residual ConvMamba (SRCM) block and enabling dynamic bi-temporal interaction via the Adaptive Global Local Guided Fusion (AGLGF) block. The model achieves state-of-the-art or competitive performance on WHU-CD, LEVIR-CD, and LEVIR+-CD, with ablations confirming the importance of SRCM and AGLGF for dense prediction. The work demonstrates that combining global and local information yields more discriminative change features, and provides an open-source implementation to advance practical CD applications. This approach offers a scalable, efficient alternative to Transformer-heavy methods for dense remote sensing change detection tasks, with potential extensions to self-supervised learning.

Abstract

Recently, the Mamba architecture based on state space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense prediction tasks (e.g., binary CD). In this article, we propose a model called CDMamba, which effectively combines global and local features for handling binary CD tasks. Specifically, the Scaled Residual ConvMamba (SRCM) block is proposed to utilize the ability of Mamba to extract global features and convolution to enhance the local details to alleviate the issue that current Mamba-based methods lack detailed clues and are difficult to achieve fine detection in dense prediction tasks. Furthermore, considering the characteristics of bi-temporal feature interaction required for CD, the Adaptive Global Local Guided Fusion (AGLGF) block is proposed to dynamically facilitate the bi-temporal interaction guided by other temporal global/local features. Our intuition is that more discriminative change features can be acquired with the guidance of other temporal features. Extensive experiments on five datasets demonstrate that our proposed CDMamba is comparable to the current methods (such as the F1/IoU scores are improved by 2.10%/3.00% and 2.44%/2.91% on LEVIR+CD and CLCD, respectively). Our code is open-sourced at https://github.com/zmoka-zht/CDMamba.

CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection

TL;DR

Abstract

Paper Structure (30 sections, 30 equations, 7 figures, 7 tables)

This paper contains 30 sections, 30 equations, 7 figures, 7 tables.

Introduction
Related Work
CNN-based CD Models
Transformer-based CD Models
Mamba-based Models in Vision Tasks
CDMamba
Preliminaries
Overview
Scaled Residual ConvMamba Block
Adaptive Global Local Guided Fusion Block
Experimental Results and Analysis
Data description
Wuhan University
Learning, VIsion, and Remote sensing
LEVIR+-CD
...and 15 more sections

Figures (7)

Figure 1: Illustration of our method. (a) is the architecture of the proposed CDMamba. T1 and T2 represent bi-temporal images, and GT means the ground truth. (b) represents the encoder composed of the Scaled Residual ConvMamba (SRCM) block, as well as its main component, the ConvMamba module. $F^l_{in}$, $F^l_{out}$ represent the input and output features of various levels from bi-temporal images. (c) represents the decoder formed by SRCM, where $F^l_{d}$ and $F^{l-1}_{d}$ represent the current level and the previous level differential feature, respectively. And $\overline{F}^l_{d}$ is the feature after multi-level fusion. (d) indicates the Adaptive Global Local Guided Fusion (AGLGF) block, where $F^l_{1}$ and $F^l_{2}$ are bi-temporal features at the same level. $F^l_{d}$ is the differential features at level l. The L-GF represents the local-guided feature fusion module and the G-GF means the global-guided feature fusion module. And $\sum$ is the weighted summation,
Figure 2: Illustration of our global-guided feature fusion (G-GF) module and local-guided feature fusion (L-GF) module. $\sigma$ means the gate activation function. $F_1$ and $F_2$ represent bi-temporal features, respectively. And $F_{GGF}$ and $F_{LGF}$ are global-guided and local-guided fused features.
Figure 3: Visualization results of different methods on the WHU-CD test set. (a)-(d) are representative samples. White represents a true positive, black is a true negative, red indicates a false positive, and green stands as a false negative.
Figure 4: Visualization results of different methods on the LEVIR-CD test set. (a)-(d) are representative samples. White represents a true positive, black is a true negative, red indicates a false positive, and green stands as a false negative.
Figure 5: Visualization results of different methods on the LEVIR+-CD test set. (a)-(d) are representative samples. White represents a true positive, black is a true negative, red indicates a false positive, and green stands as a false negative.
...and 2 more figures

CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection

TL;DR

Abstract

CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)