Table of Contents
Fetching ...

A Mamba-based Siamese Network for Remote Sensing Change Detection

Jay N. Paranjape, Celso de Melo, Vishal M. Patel

TL;DR

The paper tackles change detection in remote sensing by comparing pre- and post-change imagery under challenging conditions like illumination changes and misalignment. It introduces M-CD, a Mamba-based architecture with a Siamese VMamba encoder, a multi-scale Difference Module, and a Channel-Averaged VSS decoder that uses four-direction SS2D scanning to capture temporal and spatial cues. Across four public datasets, M-CD achieves significant improvements over CNN-, Transformer-, and diffusion-based baselines while requiring less large-scale pretraining than diffusion methods. This work highlights selective state-space modeling as a scalable path to wide receptive fields for robust CD, with practical impact for environmental monitoring and planning.

Abstract

Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significant changes, given two images at different times. In this paper, we propose a Mamba-based Change Detector (M-CD) that segments out the regions of interest even better. Mamba-based architectures demonstrate linear-time training capabilities and an improved receptive field over transformers. Our experiments on four widely used change detection datasets demonstrate significant improvements over existing state-of-the-art (SOTA) methods. Our code and pre-trained models are available at https://github.com/JayParanjape/M-CD

A Mamba-based Siamese Network for Remote Sensing Change Detection

TL;DR

The paper tackles change detection in remote sensing by comparing pre- and post-change imagery under challenging conditions like illumination changes and misalignment. It introduces M-CD, a Mamba-based architecture with a Siamese VMamba encoder, a multi-scale Difference Module, and a Channel-Averaged VSS decoder that uses four-direction SS2D scanning to capture temporal and spatial cues. Across four public datasets, M-CD achieves significant improvements over CNN-, Transformer-, and diffusion-based baselines while requiring less large-scale pretraining than diffusion methods. This work highlights selective state-space modeling as a scalable path to wide receptive fields for robust CD, with practical impact for environmental monitoring and planning.

Abstract

Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significant changes, given two images at different times. In this paper, we propose a Mamba-based Change Detector (M-CD) that segments out the regions of interest even better. Mamba-based architectures demonstrate linear-time training capabilities and an improved receptive field over transformers. Our experiments on four widely used change detection datasets demonstrate significant improvements over existing state-of-the-art (SOTA) methods. Our code and pre-trained models are available at https://github.com/JayParanjape/M-CD
Paper Structure (11 sections, 3 equations, 7 figures, 4 tables)

This paper contains 11 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (Top) Example result of our M-CD. (Bottom) Average performance of M-CD with respect to existing types of CD models
  • Figure 2: An overview of the M-CD Architecture. Given a pre-change and post-change image, they are passed through a Mamba-based encoder with shared weights (Siamese) and features at multiple scales are extracted. The Difference Module combines them before sending to the Mask Decoder, which uses a Mamba-based Decoder to generate the predicted change map.
  • Figure 3: Architecture of Visual State Space Block and Selective Scan 2D block.
  • Figure 4: Architecture of the Difference Module and Joint Selective Scan.
  • Figure 5: Architecture of Channel Averaged VSS Block
  • ...and 2 more figures