Table of Contents
Fetching ...

ChessMamba: Structure-Aware Interleaving of State Spaces for Change Detection in Remote Sensing Images

Lei Ding, Tong Liu, Xuanguang Liu, Xiangyun Liu, Haitao Guo, Jun Lu

TL;DR

ChessMamba tackles cross-source change detection in multi-temporal remote sensing by embedding structural priors into selective state-space modeling. It introduces Chessboard interleaving (Chessboard Shuffle) to preserve 2D topology while enabling direct per-pixel temporal comparisons, and Mono-Context Aggregated SSM (MCA-SSM) to emphasize mono-temporal local context before state propagation. The approach combines a SpatialMamba encoder with a cross-time decoder, achieving state-of-the-art results on Levir-CD, BRIGHT, and SECOND across BCD, BDA, and SCD tasks, with strong localization and favorable efficiency ($O(N)$-like behavior in sequence processing). These findings highlight the importance of geometric coherence in multi-source fusion for robust, high-resolution CD in heterogeneous RS data, and the framework shows promising generalization to diverse CD scenarios.

Abstract

Change detection (CD) in multitemporal remote sensing imagery presents significant challenges for fine-grained recognition, owing to heterogeneity and spatiotemporal misalignment. However, existing methodologies based on vision transformers or state-space models typically disrupt local structural consistency during temporal serialization, obscuring discriminative cues under misalignment and hindering reliable change localization. To address this, we introduce ChessMamba, a structure-aware framework leveraging interleaved state-space modeling for robust CD with multi-temporal inputs. ChessMamba integrates a SpatialMamba encoder with a lightweight cross-source interaction module, featuring two key innovations: (i) Chessboard interleaving with snake scanning order, which serializes multi-temporal features into a unified sequence within a single forward pass, thereby shortening interaction paths and enabling direct comparison for accurate change localization; and (ii) Structure-aware fusion via multi-dilated convolutions, selectively capturing center-and-corner neighborhood contexts within each mono-temporal. Comprehensive evaluations on three CD tasks, including binary CD, semantic CD and multimodal building damage assessment, demonstrate that ChessMamba effectively fuses heterogeneous features and achieves substantial accuracy improvements over state-of-the-art methods.The relevant code will be available at: github.com/DingLei14/ChessMamba.

ChessMamba: Structure-Aware Interleaving of State Spaces for Change Detection in Remote Sensing Images

TL;DR

ChessMamba tackles cross-source change detection in multi-temporal remote sensing by embedding structural priors into selective state-space modeling. It introduces Chessboard interleaving (Chessboard Shuffle) to preserve 2D topology while enabling direct per-pixel temporal comparisons, and Mono-Context Aggregated SSM (MCA-SSM) to emphasize mono-temporal local context before state propagation. The approach combines a SpatialMamba encoder with a cross-time decoder, achieving state-of-the-art results on Levir-CD, BRIGHT, and SECOND across BCD, BDA, and SCD tasks, with strong localization and favorable efficiency (-like behavior in sequence processing). These findings highlight the importance of geometric coherence in multi-source fusion for robust, high-resolution CD in heterogeneous RS data, and the framework shows promising generalization to diverse CD scenarios.

Abstract

Change detection (CD) in multitemporal remote sensing imagery presents significant challenges for fine-grained recognition, owing to heterogeneity and spatiotemporal misalignment. However, existing methodologies based on vision transformers or state-space models typically disrupt local structural consistency during temporal serialization, obscuring discriminative cues under misalignment and hindering reliable change localization. To address this, we introduce ChessMamba, a structure-aware framework leveraging interleaved state-space modeling for robust CD with multi-temporal inputs. ChessMamba integrates a SpatialMamba encoder with a lightweight cross-source interaction module, featuring two key innovations: (i) Chessboard interleaving with snake scanning order, which serializes multi-temporal features into a unified sequence within a single forward pass, thereby shortening interaction paths and enabling direct comparison for accurate change localization; and (ii) Structure-aware fusion via multi-dilated convolutions, selectively capturing center-and-corner neighborhood contexts within each mono-temporal. Comprehensive evaluations on three CD tasks, including binary CD, semantic CD and multimodal building damage assessment, demonstrate that ChessMamba effectively fuses heterogeneous features and achieves substantial accuracy improvements over state-of-the-art methods.The relevant code will be available at: github.com/DingLei14/ChessMamba.

Paper Structure

This paper contains 15 sections, 8 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Overview of ChessMamba, a spatio-temporal context-aware SSM framework for feature fusion. The Chessboard Mamba blocks facilitate cross-temporal interactions within state-space propagation, sharpening change localization under misalignment.
  • Figure 2: Calculations within a Chess-Mamba block. The chessboard interleaving enables direct per-pixel comparisons while preserving 2D neighborhood topology.
  • Figure 3: Aggregation of Mono-Context. At each position, dilated kernels aggregate local context exclusively from one source.
  • Figure 4: Variants of ChessMamba adapting to different CD tasks.
  • Figure 5: ChessMamba CD predictions with spatial shifts $\epsilon = 4, 8, 16$ pixels. Green/red: FN/FP regions.
  • ...and 7 more figures