Table of Contents
Fetching ...

PIF-Net: Ill-Posed Prior Guided Multispectral and Hyperspectral Image Fusion via Invertible Mamba and Fusion-Aware LoRA

Baisong Li, Xingwang Wang, Haixiao Xu

TL;DR

This paper addresses the ill-posed nature of multispectral and hyperspectral image fusion (MHIF) caused by spectral-spatial misalignment by introducing PIF-Net. It combines an invertible Mamba-based spectral branch with a Fusion-Aware LoRA to enable reversible, efficient fusion under ill-posed priors, supported by an Ill-Posed Residual Prior Extraction Module and a guided feature-consistency loss. The approach achieves state-of-the-art fusion quality and efficiency on three public datasets, demonstrating strong cross-modal alignment and high fidelity in both spectral and spatial details. The results indicate practical potential for real-time MHIF applications in remote sensing and related fields.

Abstract

The goal of multispectral and hyperspectral image fusion (MHIF) is to generate high-quality images that simultaneously possess rich spectral information and fine spatial details. However, due to the inherent trade-off between spectral and spatial information and the limited availability of observations, this task is fundamentally ill-posed. Previous studies have not effectively addressed the ill-posed nature caused by data misalignment. To tackle this challenge, we propose a fusion framework named PIF-Net, which explicitly incorporates ill-posed priors to effectively fuse multispectral images and hyperspectral images. To balance global spectral modeling with computational efficiency, we design a method based on an invertible Mamba architecture that maintains information consistency during feature transformation and fusion, ensuring stable gradient flow and process reversibility. Furthermore, we introduce a novel fusion module called the Fusion-Aware Low-Rank Adaptation module, which dynamically calibrates spectral and spatial features while keeping the model lightweight. Extensive experiments on multiple benchmark datasets demonstrate that PIF-Net achieves significantly better image restoration performance than current state-of-the-art methods while maintaining model efficiency.

PIF-Net: Ill-Posed Prior Guided Multispectral and Hyperspectral Image Fusion via Invertible Mamba and Fusion-Aware LoRA

TL;DR

This paper addresses the ill-posed nature of multispectral and hyperspectral image fusion (MHIF) caused by spectral-spatial misalignment by introducing PIF-Net. It combines an invertible Mamba-based spectral branch with a Fusion-Aware LoRA to enable reversible, efficient fusion under ill-posed priors, supported by an Ill-Posed Residual Prior Extraction Module and a guided feature-consistency loss. The approach achieves state-of-the-art fusion quality and efficiency on three public datasets, demonstrating strong cross-modal alignment and high fidelity in both spectral and spatial details. The results indicate practical potential for real-time MHIF applications in remote sensing and related fields.

Abstract

The goal of multispectral and hyperspectral image fusion (MHIF) is to generate high-quality images that simultaneously possess rich spectral information and fine spatial details. However, due to the inherent trade-off between spectral and spatial information and the limited availability of observations, this task is fundamentally ill-posed. Previous studies have not effectively addressed the ill-posed nature caused by data misalignment. To tackle this challenge, we propose a fusion framework named PIF-Net, which explicitly incorporates ill-posed priors to effectively fuse multispectral images and hyperspectral images. To balance global spectral modeling with computational efficiency, we design a method based on an invertible Mamba architecture that maintains information consistency during feature transformation and fusion, ensuring stable gradient flow and process reversibility. Furthermore, we introduce a novel fusion module called the Fusion-Aware Low-Rank Adaptation module, which dynamically calibrates spectral and spatial features while keeping the model lightweight. Extensive experiments on multiple benchmark datasets demonstrate that PIF-Net achieves significantly better image restoration performance than current state-of-the-art methods while maintaining model efficiency.

Paper Structure

This paper contains 26 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of three typical hyperspectral image fusion frameworks: (a) Single-branch framework: relies on a single pathway and is prone to alignment errors; (b) Dual-branch framework: models spectral and spatial features separately but lacks effective spatial transformation and collaborative mechanisms; (c) The proposed PIF-Net model: integrates invertible state-space modeling with fusion-aware LoRA, enabling bidirectional information flow in the frequency domain and robust cross-modal alignment.
  • Figure 2: Overview of the proposed PIF-Net. The spectral branch utilizes Invertible Mamba Blocks to enable bidirectional flow of low-frequency and high-frequency information; the spatial branch effectively extracts rich spatial texture features under the guidance of ill-posed residual priors and high-frequency spatial features.
  • Figure 3: Illustration of the Invertible Mamba Block. The block takes low-frequency features $\mathbf{X}^L_i$ and high-frequency features $\mathbf{X}^H_i$ as input, and employs an affine coupling mechanism built on lightweight Segmented Spectral Mamba Modules (SSMM) to enable efficient bidirectional interaction and fusion of spectral information. Split and Concat denote channel-wise feature division and aggregation, respectively. SS2D refers to the 2D State Space Module in VMamba liu2024vmamba.
  • Figure 4: Illustration of the FAM-LoRA module that fuses the main spatial feature $\mathbf{Y}_i$ and auxiliary guidance $\mathbf{X}^R_i$ through channel transformation, LKA, SE attention, and multi-head LoRA, achieving efficient and accurate semantic fusion. The first part of LoRA's parameters is initialized with a standard Gaussian distribution, while the second part is initialized to zero. The operations Split and Concat refer to feature division and aggregation along the channel dimension.
  • Figure 5: The pseudo-color images, corresponding $\times$4 super-resolution results, and SAM error maps generated by all comparative models on three datasets: (1) the sixth test area of the Chikusei dataset (rows 1–2, RGB bands: R=101, G=40, B=10), (2) a test area from the PaviaU dataset (rows 3–4, RGB bands: R=20, G=30, B=40), and (3) a test area from the Houston dataset (rows 5–6, RGB bands: R=10, G=76, B=2).
  • ...and 1 more figures