Table of Contents
Fetching ...

Efficient Remote Sensing Change Detection with Change State Space Models

Elman Ghazaei, Erchan Aptoula

TL;DR

Remote sensing change detection suffers from CNNs' limited ability to model long-range dependencies and ViTs' high computational cost. The authors introduce Change State Space Model (CSSM), a two-input, $L1$-based change selector built on the Vision Mamba lineage, integrated into an encoder–CSSM–decoder architecture to focus on bi-temporal changes and reduce parameters. Evaluations on SYSU-CD, LEVIR-CD+, and WHU-CD show CSSM achieves state-of-the-art or competitive results while using up to $21.25\times$ fewer parameters and displaying robustness to degraded inputs, with losses formed as $L_{seg} = L_{CE} + L_{lov}$. This work offers a scalable, efficient solution for large-scale remote sensing CD and points to future work on domain generalization across diverse datasets.

Abstract

Despite their frequent use for change detection, both ConvNets and Vision transformers (ViT) exhibit well-known limitations, namely the former struggle to model long-range dependencies while the latter are computationally inefficient, rendering them challenging to train on large-scale datasets. Vision Mamba, an architecture based on State Space Models has emerged as an alternative addressing the aforementioned deficiencies and has been already applied to remote sensing change detection, though mostly as a feature extracting backbone. In this article the Change State Space Model is introduced, that has been specifically designed for change detection by focusing on the relevant changes between bi-temporal images, effectively filtering out irrelevant information. By concentrating solely on the changed features, the number of network parameters is reduced, enhancing significantly computational efficiency while maintaining high detection performance and robustness against input degradation. The proposed model has been evaluated via three benchmark datasets, where it outperformed ConvNets, ViTs, and Mamba-based counterparts at a fraction of their computational complexity. The implementation will be made available at https://github.com/Elman295/CSSM upon acceptance.

Efficient Remote Sensing Change Detection with Change State Space Models

TL;DR

Remote sensing change detection suffers from CNNs' limited ability to model long-range dependencies and ViTs' high computational cost. The authors introduce Change State Space Model (CSSM), a two-input, -based change selector built on the Vision Mamba lineage, integrated into an encoder–CSSM–decoder architecture to focus on bi-temporal changes and reduce parameters. Evaluations on SYSU-CD, LEVIR-CD+, and WHU-CD show CSSM achieves state-of-the-art or competitive results while using up to fewer parameters and displaying robustness to degraded inputs, with losses formed as . This work offers a scalable, efficient solution for large-scale remote sensing CD and points to future work on domain generalization across diverse datasets.

Abstract

Despite their frequent use for change detection, both ConvNets and Vision transformers (ViT) exhibit well-known limitations, namely the former struggle to model long-range dependencies while the latter are computationally inefficient, rendering them challenging to train on large-scale datasets. Vision Mamba, an architecture based on State Space Models has emerged as an alternative addressing the aforementioned deficiencies and has been already applied to remote sensing change detection, though mostly as a feature extracting backbone. In this article the Change State Space Model is introduced, that has been specifically designed for change detection by focusing on the relevant changes between bi-temporal images, effectively filtering out irrelevant information. By concentrating solely on the changed features, the number of network parameters is reduced, enhancing significantly computational efficiency while maintaining high detection performance and robustness against input degradation. The proposed model has been evaluated via three benchmark datasets, where it outperformed ConvNets, ViTs, and Mamba-based counterparts at a fraction of their computational complexity. The implementation will be made available at https://github.com/Elman295/CSSM upon acceptance.

Paper Structure

This paper contains 6 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the proposed framework at the top, and details of the CSSM blocks and CSSM selection mechanism at the bottom half. The CSSM blocks are integrated between a couple of lightweight encoder and decoder, to selectively extract only the most relevant target features.
  • Figure 2: Qualitative CD results on the SYSU-CD dataset. False Positives, False Negatives, True Positives.
  • Figure 3: Robustness assessment against degraded WHU-CD data.