Table of Contents
Fetching ...

S2WMamba: A Spectral-Spatial Wavelet Mamba for Pansharpening

Haoyu Zhang, Junhan Luo, Yugang Cao, Siran Peng, Jie Huang, Liangjian-Deng

TL;DR

S2WMamba addresses the persistent spatial–spectral trade-off in pansharpening by disentangling frequency information in the wavelet domain. It introduces a dual-branch framework with a 2D Haar DWT-guided Spectral Branch and a channel-wise 1D Haar DWT-guided Spatial Branch, coupled through FMamba cross-modal interactions and a Multi-Scale Dynamic Gate for adaptive fusion. The approach demonstrates state-of-the-art or competitive performance on WV3, GF2, and QB with strong efficiency, supported by extensive ablations that justify the dual-branch design, Mamba backbone, and fusion strategy. The work offers a principled wavelet-based fusion paradigm that leverages long-range modeling for robust, high-fidelity HRMS pansharpening with practical implications for remote sensing applications.

Abstract

Pansharpening fuses a high-resolution PAN image with a low-resolution multispectral (LRMS) image to produce an HRMS image. A key difficulty is that jointly processing PAN and MS often entangles spatial detail with spectral fidelity. We propose S2WMamba, which explicitly disentangles frequency information and then performs lightweight cross-modal interaction. Concretely, a 2D Haar DWT is applied to PAN to localize spatial edges and textures, while a channel-wise 1D Haar DWT treats each pixel's spectrum as a 1D signal to separate low/high-frequency components and limit spectral distortion. The resulting Spectral branch injects wavelet-extracted spatial details into MS features, and the Spatial branch refines PAN features using spectra from the 1D pyramid; the two branches exchange information through Mamba-based cross-modulation that models long-range dependencies with linear complexity. A multi-scale dynamic gate (multiplicative + additive) then adaptively fuses branch outputs.On WV3, GF2, and QB, S2WMamba matches or surpasses recent strong baselines (FusionMamba, CANNet, U2Net, ARConv), improving PSNR by up to 0.23 dB and reaching HQNR 0.956 on full-resolution WV3. Ablations justify the choice of 2D/1D DWT placement, parallel dual branches, and the fusion gate. Our code is available at https://github.com/KagUYa66/S2WMamba.

S2WMamba: A Spectral-Spatial Wavelet Mamba for Pansharpening

TL;DR

S2WMamba addresses the persistent spatial–spectral trade-off in pansharpening by disentangling frequency information in the wavelet domain. It introduces a dual-branch framework with a 2D Haar DWT-guided Spectral Branch and a channel-wise 1D Haar DWT-guided Spatial Branch, coupled through FMamba cross-modal interactions and a Multi-Scale Dynamic Gate for adaptive fusion. The approach demonstrates state-of-the-art or competitive performance on WV3, GF2, and QB with strong efficiency, supported by extensive ablations that justify the dual-branch design, Mamba backbone, and fusion strategy. The work offers a principled wavelet-based fusion paradigm that leverages long-range modeling for robust, high-fidelity HRMS pansharpening with practical implications for remote sensing applications.

Abstract

Pansharpening fuses a high-resolution PAN image with a low-resolution multispectral (LRMS) image to produce an HRMS image. A key difficulty is that jointly processing PAN and MS often entangles spatial detail with spectral fidelity. We propose S2WMamba, which explicitly disentangles frequency information and then performs lightweight cross-modal interaction. Concretely, a 2D Haar DWT is applied to PAN to localize spatial edges and textures, while a channel-wise 1D Haar DWT treats each pixel's spectrum as a 1D signal to separate low/high-frequency components and limit spectral distortion. The resulting Spectral branch injects wavelet-extracted spatial details into MS features, and the Spatial branch refines PAN features using spectra from the 1D pyramid; the two branches exchange information through Mamba-based cross-modulation that models long-range dependencies with linear complexity. A multi-scale dynamic gate (multiplicative + additive) then adaptively fuses branch outputs.On WV3, GF2, and QB, S2WMamba matches or surpasses recent strong baselines (FusionMamba, CANNet, U2Net, ARConv), improving PSNR by up to 0.23 dB and reaching HQNR 0.956 on full-resolution WV3. Ablations justify the choice of 2D/1D DWT placement, parallel dual branches, and the fusion gate. Our code is available at https://github.com/KagUYa66/S2WMamba.

Paper Structure

This paper contains 35 sections, 25 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: This figure illustrates our dual-branch framework. The spectral branch processes PAN images using a 2D DWT, while the spatial branch handles LRMS images with a 1D DWT. Both branches leverage SSM (Mamba) for fusion before reconstruction.
  • Figure 2: The overall workflow of our S2WMNet. Our network consists of three main components: a Spectral Branch, a Spatial Branch, and a Multi-Scale Dynamic Gate (MSDG) Block. The 2D Wavelet Pyramid Construction (2DWPC) and the 1D Wavelet Pyramid Construction (1DWPC) respectively provide spatial or spectral details for further fusion in Spectral or Spatial Branch.
  • Figure 3: An illustration of the core wavelet decomposition strategies. The 2D DWT (top) is applied to the spatial dimensions of an image, decomposing it into four frequency sub-bands. The 1D DWT (bottom) is applied along the channel axis, disentangling the spectral information into low- and high-frequency components.
  • Figure 4: The workflow of a representative Spectral Branch Stage (e.g., Stage- i) is depicted here. This structure is identically applied to all $n_r$ stages, in addition to the size of the input. The input to SpeBS-i consists of the output of the previous stage, along with one set of sub-bands from the 2DWPC. The decomposition level of the sub-bands in 2DWPC is $n_r - i + 1$ so that each sub-band shares the same resolution with $M_{i-1}$. Here, j is used to represent the corresponding 2D DWT decomposition level in 2DWPC, with the relation $j=n_r - i + 1$. FM here represent FMamba Block.
  • Figure 5: The workflow of a representative Spatial Branch Stage (e.g., Stage- i) is depicted here. This structure is identically applied to all $n_c$ stages, in addition to the size of the input. The input to SpaBS-i consists of the output of the previous stage ($P_{i-1}$), along with one set of sub-bands from 1DWPC. The decomposition level of the sub-bands is $n_c - i + 1$ so that each sub-band shares the same channel with $P_{i-1}$. Here, j is used to represent the corresponding decomposition level in 1DWPC, with the relation $j=n_c - i + 1$.
  • ...and 5 more figures