CDXLSTM: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory
Zhenkai Wu, Xiaowen Ma, Rongrong Lian, Kai Zheng, Wei Zhang
TL;DR
The paper tackles remote sensing change detection (RS-CD) by addressing the trade-offs of existing CNN, Transformer, and Mamba-based methods in balancing accuracy and efficiency. It introduces CDXLSTM, an XLSTM-based framework with a scale-specific Feature Enhancer (CTGP for global context in low-resolution features and CTSR for spatial refinement in high-resolution features) and a Cross-scale Interactive Fusion (CSIF) module to progressively combine global semantics with detailed spatial information. The architecture uses a Siamese backbone with Bi-mLSTM-based long-term modeling and axial Bi-mLSTM attention within CTSR, delivering linear computational complexity and improved interpretability. On LEVIR-CD, WHU-CD, and CLCD, CDXLSTM achieves state-of-the-art F1 scores with only 16.19M parameters and 3.92G FLOPs, outperforming recent methods while reducing compute, and the training losses combine BCE and Dice terms as $\mathcal{L} = \lambda_{ce}\mathcal{L}_{ce} + \lambda_{dice}\mathcal{L}_{dice}$ to supervise segmentation performance.
Abstract
In complex scenes and varied conditions, effectively integrating spatial-temporal context is crucial for accurately identifying changes. However, current RS-CD methods lack a balanced consideration of performance and efficiency. CNNs lack global context, Transformers are computationally expensive, and Mambas face CUDA dependence and local correlation loss. In this paper, we propose CDXLSTM, with a core component that is a powerful XLSTM-based feature enhancement layer, integrating the advantages of linear computational complexity, global context perception, and strong interpret-ability. Specifically, we introduce a scale-specific Feature Enhancer layer, incorporating a Cross-Temporal Global Perceptron customized for semantic-accurate deep features, and a Cross-Temporal Spatial Refiner customized for detail-rich shallow features. Additionally, we propose a Cross-Scale Interactive Fusion module to progressively interact global change representations with spatial responses. Extensive experimental results demonstrate that CDXLSTM achieves state-of-the-art performance across three benchmark datasets, offering a compelling balance between efficiency and accuracy. Code is available at https://github.com/xwmaxwma/rschange.
