Table of Contents
Fetching ...

Pan-Mamba: Effective pan-sharpening with State Space Model

Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

TL;DR

Pan-Mamba tackles pan-sharpening by fusing LRMS with PAN using a state-space-inspired Mamba backbone. It introduces channel swapping and cross-modal Mamba blocks to enable efficient cross-modal interactions with linear complexity. Experiments on WV2, GF2, and WV3 show state-of-the-art fusion quality, with ablations confirming each component's contribution. This work is the first to integrate Mamba into pan-sharpening and provides open-source code for reproducibility.

Abstract

Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening motivates our exploration. Our contribution, Pan-Mamba, represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling. In Pan-Mamba, we customize two core components: channel swapping Mamba and cross-modal Mamba, strategically designed for efficient cross-modal information exchange and fusion. The former initiates a lightweight cross-modal interaction through the exchange of partial panchromatic and multi-spectral channels, while the latter facilities the information representation capability by exploiting inherent cross-modal relationships. Through extensive experiments across diverse datasets, our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening. To the best of our knowledge, this work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques. The source code is available at \url{https://github.com/alexhe101/Pan-Mamba}.

Pan-Mamba: Effective pan-sharpening with State Space Model

TL;DR

Pan-Mamba tackles pan-sharpening by fusing LRMS with PAN using a state-space-inspired Mamba backbone. It introduces channel swapping and cross-modal Mamba blocks to enable efficient cross-modal interactions with linear complexity. Experiments on WV2, GF2, and WV3 show state-of-the-art fusion quality, with ablations confirming each component's contribution. This work is the first to integrate Mamba into pan-sharpening and provides open-source code for reproducibility.

Abstract

Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening motivates our exploration. Our contribution, Pan-Mamba, represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling. In Pan-Mamba, we customize two core components: channel swapping Mamba and cross-modal Mamba, strategically designed for efficient cross-modal information exchange and fusion. The former initiates a lightweight cross-modal interaction through the exchange of partial panchromatic and multi-spectral channels, while the latter facilities the information representation capability by exploiting inherent cross-modal relationships. Through extensive experiments across diverse datasets, our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening. To the best of our knowledge, this work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques. The source code is available at \url{https://github.com/alexhe101/Pan-Mamba}.
Paper Structure (24 sections, 8 equations, 5 figures, 4 tables, 3 algorithms)

This paper contains 24 sections, 8 equations, 5 figures, 4 tables, 3 algorithms.

Figures (5)

  • Figure 1: The network structure of our proposed Pan-Mamba, which includes three Key components: Mamba block for long-range feature extraction, CS Mamba and cross modal mamba for shallow and deep feature fusion.
  • Figure 2: The result of our approach was compared against nine other methods on WorldView-II dataset.
  • Figure 3: The result of our approach was compared against nine other methods on WorldView-III dataset.
  • Figure 4: The result of our approach was compared against four other methods on full-resolution WV2 dataset.
  • Figure 5: Performance and efficiency comparisons between different operators in ablation section and our model.