RSMamba: Remote Sensing Image Classification with State Space Model
Keyan Chen, Bowen Chen, Chenyang Liu, Wenyuan Li, Zhengxia Zou, Zhenwei Shi
TL;DR
This work tackles remote sensing image classification under diverse spatiotemporal resolutions by introducing RSMamba, an efficient State Space Model–based backbone augmented with dynamic multi-path activation to model non-causal 2D data. The approach converts 2D images into 1D sequences, processes them through multiple path Mamba blocks, and uses mean pooling for final classification without CLS tokens. Through comprehensive ablations and evaluations on UC Merced, AID, and RESISC45, RSMamba demonstrates robust performance and data efficiency, outperforming CNN and Transformer baselines. The results suggest RSMamba as a strong candidate backbone for next-generation visual foundation models in remote sensing and related domains.
Abstract
Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this paper, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to two-dimensional image data, we propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-causal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code will be available at \url{https://github.com/KyanChen/RSMamba}.
