Table of Contents
Fetching ...

SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

TL;DR

SSDiff rethinks remote sensing pansharpening as a spatial-spectral fusion problem by splitting a diffusion model into dedicated spatial and spectral branches. It introduces an alternating projection fusion module (APFM) that decouples and fuses features across subspaces, and a frequency modulation inter-branch module (FMIM) to balance frequency information between branches; a LoRA-like branch-wise fine-tuning (L-BAF) further refines discriminative features without increasing parameters. Across WorldView-3, WorldView-2, GaoFen-2, and QuickBird datasets, SSDiff achieves state-of-the-art results in both reduced- and full-resolution settings, with strong spectral fidelity and spatial detail preservation and competitive inference efficiency. The approach provides a principled, plug-in fusion mechanism for diffusion-based pansharpening and offers open-source potential to advance practical remote-sensing workflows.

Abstract

Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-spectral integrated diffusion model for the remote sensing pansharpening task, called SSDiff, which considers the pansharpening process as the fusion process of spatial and spectral components from the perspective of subspace decomposition. Specifically, SSDiff utilizes spatial and spectral branches to learn spatial details and spectral features separately, then employs a designed alternating projection fusion module (APFM) to accomplish the fusion. Furthermore, we propose a frequency modulation inter-branch module (FMIM) to modulate the frequency distribution between branches. The two components of SSDiff can perform favorably against the APFM when utilizing a LoRA-like branch-wise alternative fine-tuning method. It refines SSDiff to capture component-discriminating features more sufficiently. Finally, extensive experiments on four commonly used datasets, i.e., WorldView-3, WorldView-2, GaoFen-2, and QuickBird, demonstrate the superiority of SSDiff both visually and quantitatively. The code will be made open source after possible acceptance.

SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

TL;DR

SSDiff rethinks remote sensing pansharpening as a spatial-spectral fusion problem by splitting a diffusion model into dedicated spatial and spectral branches. It introduces an alternating projection fusion module (APFM) that decouples and fuses features across subspaces, and a frequency modulation inter-branch module (FMIM) to balance frequency information between branches; a LoRA-like branch-wise fine-tuning (L-BAF) further refines discriminative features without increasing parameters. Across WorldView-3, WorldView-2, GaoFen-2, and QuickBird datasets, SSDiff achieves state-of-the-art results in both reduced- and full-resolution settings, with strong spectral fidelity and spatial detail preservation and competitive inference efficiency. The approach provides a principled, plug-in fusion mechanism for diffusion-based pansharpening and offers open-source potential to advance practical remote-sensing workflows.

Abstract

Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-spectral integrated diffusion model for the remote sensing pansharpening task, called SSDiff, which considers the pansharpening process as the fusion process of spatial and spectral components from the perspective of subspace decomposition. Specifically, SSDiff utilizes spatial and spectral branches to learn spatial details and spectral features separately, then employs a designed alternating projection fusion module (APFM) to accomplish the fusion. Furthermore, we propose a frequency modulation inter-branch module (FMIM) to modulate the frequency distribution between branches. The two components of SSDiff can perform favorably against the APFM when utilizing a LoRA-like branch-wise alternative fine-tuning method. It refines SSDiff to capture component-discriminating features more sufficiently. Finally, extensive experiments on four commonly used datasets, i.e., WorldView-3, WorldView-2, GaoFen-2, and QuickBird, demonstrate the superiority of SSDiff both visually and quantitatively. The code will be made open source after possible acceptance.
Paper Structure (18 sections, 2 theorems, 17 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 2 theorems, 17 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

Assuming that the existing two arbitrary vectors $\mathbf{a} \in \text{dom}\mathbf{U} \in \mathbb{R}^{n}$ and $\mathbf{b} \in \text{dom}\mathbf{G} \in \mathbb{R}^{n}$, then $\mathbf{P}\mathbf{b}=\lambda \mathbf{a}=\mathbf{p}$, we have the following formula: where $\mathbf{P}$ is a projection matrix, $\lambda$ denotes the scaling factor, and $\mathbf{p}$ is the vector in the same domain as $\mathb

Figures (8)

  • Figure 1: Schematic of (a) DL-based pansharpening approach in a supervised fashion, in which the "network" can be any deep module, e.g., denoising diffusion probabilistic models (DDPM). The comparison of (b) the LoRA based on DDPM and (c) the proposed APFM in our SSDiff. $\mathbf{G}$ and $\mathbf{U}$ represent the spectral and spatial domains, respectively. The LoRA can expand learnable weights $\mathbf{W_0}$ with $\Delta \mathbf{W}$ (but without applications to pansharpening), and the given APFM can obtain pansharpened HrMSI from PAN image and LrMSI through alternating projections.
  • Figure 2: Overall framework of the proposed SSDiff. $\epsilon_t = \sqrt{1-\bar{\alpha}_t}\epsilon$ is a Gaussian noise, where $t$ is the time step. $\mathcal{F}_{spa}$ is the output of the spatial branch, and $\mathcal{F}_{spe}$ is the output of the spectral branch. The process of APFM follows Theorem \ref{['APF']}.
  • Figure 3: Schematic diagram of the relationship between subspace decomposition and self-attention mechanism. $f(\mathbf{Q}, \mathbf{K})$ is the classic self-similarity equation in self-attention mechanism.
  • Figure 4: The denoising process. The top row consists of a series of iteratively generated images from the gradual denoising process. The subsequent two rows represent the associated low-frequency and high-frequency spatial domain information obtained through inverse Fourier transform from the denoised image in the first row of each corresponding step.
  • Figure 5: The sketch of the proposed LoRA-like branch-wise alternative fine-tuning process.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Lemma 1: strang2022introduction
  • proof
  • Definition 1: dian2019hyperspectral
  • Remark 1
  • Theorem 1
  • proof