Table of Contents
Fetching ...

2DMCG:2DMambawith Change Flow Guidance for Change Detection in Remote Sensing

JunYao Kaung, HongWei Ge

TL;DR

This work tackles remote sensing change detection by addressing the spatial misalignment and scalability challenges of CNNs and Transformers. It introduces 2DMCG, a Vision Mamba-based framework with a 2D selective SSM encoder, 2D scan-based feature fusion, and a Change Flow guided decoder to align and decode spatio-temporal changes. Key contributions include the 2D Mamba encoder, 2D selective scan for 2D spatial coherence, Cross-Channel and Spatial Reorganization Fusion modules, and a Change Flow mechanism that warps features across time for improved change localization, validated on WHU-CD+, SYSU-CD, and LEVIR-CD+. The results demonstrate state-of-the-art or competitive performance with improved efficiency, underscoring the potential of Vision Mamba architectures for scalable and accurate change detection in remote sensing applications.

Abstract

Remote sensing change detection (CD) has made significant advancements with the adoption of Convolutional Neural Networks (CNNs) and Transformers. While CNNs offer powerful feature extraction, they are constrained by receptive field limitations, and Transformers suffer from quadratic complexity when processing long sequences, restricting scalability. The Mamba architecture provides an appealing alternative, offering linear complexity and high parallelism. However, its inherent 1D processing structure causes a loss of spatial information in 2D vision tasks. This paper addresses this limitation by proposing an efficient framework based on a Vision Mamba variant that enhances its ability to capture 2D spatial information while maintaining the linear complexity characteristic of Mamba. The framework employs a 2DMamba encoder to effectively learn global spatial contextual information from multi-temporal images. For feature fusion, we introduce a 2D scan-based, channel-parallel scanning strategy combined with a spatio-temporal feature fusion method, which adeptly captures both local and global change information, alleviating spatial discontinuity issues during fusion. In the decoding stage, we present a feature change flow-based decoding method that improves the mapping of feature change information from low-resolution to high-resolution feature maps, mitigating feature shift and misalignment. Extensive experiments on benchmark datasets such as LEVIR-CD+ and WHU-CD demonstrate the superior performance of our framework compared to state-of-the-art methods, showcasing the potential of Vision Mamba for efficient and accurate remote sensing change detection.

2DMCG:2DMambawith Change Flow Guidance for Change Detection in Remote Sensing

TL;DR

This work tackles remote sensing change detection by addressing the spatial misalignment and scalability challenges of CNNs and Transformers. It introduces 2DMCG, a Vision Mamba-based framework with a 2D selective SSM encoder, 2D scan-based feature fusion, and a Change Flow guided decoder to align and decode spatio-temporal changes. Key contributions include the 2D Mamba encoder, 2D selective scan for 2D spatial coherence, Cross-Channel and Spatial Reorganization Fusion modules, and a Change Flow mechanism that warps features across time for improved change localization, validated on WHU-CD+, SYSU-CD, and LEVIR-CD+. The results demonstrate state-of-the-art or competitive performance with improved efficiency, underscoring the potential of Vision Mamba architectures for scalable and accurate change detection in remote sensing applications.

Abstract

Remote sensing change detection (CD) has made significant advancements with the adoption of Convolutional Neural Networks (CNNs) and Transformers. While CNNs offer powerful feature extraction, they are constrained by receptive field limitations, and Transformers suffer from quadratic complexity when processing long sequences, restricting scalability. The Mamba architecture provides an appealing alternative, offering linear complexity and high parallelism. However, its inherent 1D processing structure causes a loss of spatial information in 2D vision tasks. This paper addresses this limitation by proposing an efficient framework based on a Vision Mamba variant that enhances its ability to capture 2D spatial information while maintaining the linear complexity characteristic of Mamba. The framework employs a 2DMamba encoder to effectively learn global spatial contextual information from multi-temporal images. For feature fusion, we introduce a 2D scan-based, channel-parallel scanning strategy combined with a spatio-temporal feature fusion method, which adeptly captures both local and global change information, alleviating spatial discontinuity issues during fusion. In the decoding stage, we present a feature change flow-based decoding method that improves the mapping of feature change information from low-resolution to high-resolution feature maps, mitigating feature shift and misalignment. Extensive experiments on benchmark datasets such as LEVIR-CD+ and WHU-CD demonstrate the superior performance of our framework compared to state-of-the-art methods, showcasing the potential of Vision Mamba for efficient and accurate remote sensing change detection.

Paper Structure

This paper contains 33 sections, 9 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparison of 1D and 2D Mamba-based methods.Left: 1D methods transform an image into a 1D sequence. This leads to spatial discontinuity as adjacent patches (shown in red and orange) become separated in the sequence. Right: 2D methods process the image in a 2D manner, maintaining spatial continuity.
  • Figure 2: Illustration of the proposed change detection framework. The framework employs a Siamese architecture with shared weights for feature extraction. Multi-temporal images are fed into the encoder to generate feature representations. A change detection module then processes these features to produce the final change map.
  • Figure 3: Multi-stage encoder architecture based on 2D Mamba blocks. The encoder processes input images through multiple stages. Each stage consists of a 2D Mamba block (repeated N times, where N1 to N4 are stage-specific repetition counts), followed by a downsampling operation. This design progressively reduces the spatial dimensions while extracting hierarchical features.
  • Figure 4: 2D Mamba block structure
  • Figure 6: Illustration of two feature fusion methods for change detection.Top:Cross-Channel Fusion concatenates pre- and post-event image features along the channel dimension and applies 2D Scan. Bottom:Global & Local Change Fusion reorganizes the features into a larger map, enabling 2D Scan to capture both global changes (horizontal and vertical directions) and local changes (diagonal directions).
  • ...and 6 more figures