2DMCG:2DMambawith Change Flow Guidance for Change Detection in Remote Sensing
JunYao Kaung, HongWei Ge
TL;DR
This work tackles remote sensing change detection by addressing the spatial misalignment and scalability challenges of CNNs and Transformers. It introduces 2DMCG, a Vision Mamba-based framework with a 2D selective SSM encoder, 2D scan-based feature fusion, and a Change Flow guided decoder to align and decode spatio-temporal changes. Key contributions include the 2D Mamba encoder, 2D selective scan for 2D spatial coherence, Cross-Channel and Spatial Reorganization Fusion modules, and a Change Flow mechanism that warps features across time for improved change localization, validated on WHU-CD+, SYSU-CD, and LEVIR-CD+. The results demonstrate state-of-the-art or competitive performance with improved efficiency, underscoring the potential of Vision Mamba architectures for scalable and accurate change detection in remote sensing applications.
Abstract
Remote sensing change detection (CD) has made significant advancements with the adoption of Convolutional Neural Networks (CNNs) and Transformers. While CNNs offer powerful feature extraction, they are constrained by receptive field limitations, and Transformers suffer from quadratic complexity when processing long sequences, restricting scalability. The Mamba architecture provides an appealing alternative, offering linear complexity and high parallelism. However, its inherent 1D processing structure causes a loss of spatial information in 2D vision tasks. This paper addresses this limitation by proposing an efficient framework based on a Vision Mamba variant that enhances its ability to capture 2D spatial information while maintaining the linear complexity characteristic of Mamba. The framework employs a 2DMamba encoder to effectively learn global spatial contextual information from multi-temporal images. For feature fusion, we introduce a 2D scan-based, channel-parallel scanning strategy combined with a spatio-temporal feature fusion method, which adeptly captures both local and global change information, alleviating spatial discontinuity issues during fusion. In the decoding stage, we present a feature change flow-based decoding method that improves the mapping of feature change information from low-resolution to high-resolution feature maps, mitigating feature shift and misalignment. Extensive experiments on benchmark datasets such as LEVIR-CD+ and WHU-CD demonstrate the superior performance of our framework compared to state-of-the-art methods, showcasing the potential of Vision Mamba for efficient and accurate remote sensing change detection.
