ChessMamba: Structure-Aware Interleaving of State Spaces for Change Detection in Remote Sensing Images
Lei Ding, Tong Liu, Xuanguang Liu, Xiangyun Liu, Haitao Guo, Jun Lu
TL;DR
ChessMamba tackles cross-source change detection in multi-temporal remote sensing by embedding structural priors into selective state-space modeling. It introduces Chessboard interleaving (Chessboard Shuffle) to preserve 2D topology while enabling direct per-pixel temporal comparisons, and Mono-Context Aggregated SSM (MCA-SSM) to emphasize mono-temporal local context before state propagation. The approach combines a SpatialMamba encoder with a cross-time decoder, achieving state-of-the-art results on Levir-CD, BRIGHT, and SECOND across BCD, BDA, and SCD tasks, with strong localization and favorable efficiency ($O(N)$-like behavior in sequence processing). These findings highlight the importance of geometric coherence in multi-source fusion for robust, high-resolution CD in heterogeneous RS data, and the framework shows promising generalization to diverse CD scenarios.
Abstract
Change detection (CD) in multitemporal remote sensing imagery presents significant challenges for fine-grained recognition, owing to heterogeneity and spatiotemporal misalignment. However, existing methodologies based on vision transformers or state-space models typically disrupt local structural consistency during temporal serialization, obscuring discriminative cues under misalignment and hindering reliable change localization. To address this, we introduce ChessMamba, a structure-aware framework leveraging interleaved state-space modeling for robust CD with multi-temporal inputs. ChessMamba integrates a SpatialMamba encoder with a lightweight cross-source interaction module, featuring two key innovations: (i) Chessboard interleaving with snake scanning order, which serializes multi-temporal features into a unified sequence within a single forward pass, thereby shortening interaction paths and enabling direct comparison for accurate change localization; and (ii) Structure-aware fusion via multi-dilated convolutions, selectively capturing center-and-corner neighborhood contexts within each mono-temporal. Comprehensive evaluations on three CD tasks, including binary CD, semantic CD and multimodal building damage assessment, demonstrate that ChessMamba effectively fuses heterogeneous features and achieves substantial accuracy improvements over state-of-the-art methods.The relevant code will be available at: github.com/DingLei14/ChessMamba.
