Table of Contents
Fetching ...

ResPanDiff: Diffusion Model for Pansharpening by Inferring Residual Inference

Shiqi Cao, Liangjian Deng, Shangqi Deng

TL;DR

ResPanDiff tackles slow diffusion-based pansharpening by learning the residual between $LRMS$ and $HRMS$ through a dedicated diffusion process that starts from a noisy residual close to the LRMS distribution. It introduces a latent space, Shallow Cond-Injection, and a residual-focused loss to guide the diffusion toward accurate residual generation, enabling an efficient Markov chain that preserves fusion quality. Empirical results on WV3, GF2, and QB demonstrate state-of-the-art performance using as few as 15 sampling steps, delivering substantial speedups without sacrificing accuracy. The approach highlights the practicality of residual-diffusion, conditional latent features, and tailored losses for high-quality, fast pansharpening in multi-source image fusion.

Abstract

The implementation of diffusion-based pansharpening task is predominantly constrained by its slow inference speed, which results from numerous sampling steps. Despite the existing techniques aiming to accelerate sampling, they often compromise performance when fusing multi-source images. To ease this limitation, we introduce a novel and efficient diffusion model named Diffusion Model for Pansharpening by Inferring Residual Inference (ResPanDiff), which significantly reduces the number of diffusion steps without sacrificing the performance to tackle pansharpening task. In ResPanDiff, we innovatively propose a Markov chain that transits from noisy residuals to the residuals between the LRMS and HRMS images, thereby reducing the number of sampling steps and enhancing performance. Additionally, we design the latent space to help model extract more features at the encoding stage, Shallow Cond-Injection~(SC-I) to help model fetch cond-injected hidden features with higher dimensions, and loss functions to give a better guidance for the residual generation task. enabling the model to achieve superior performance in residual generation. Furthermore, experimental evaluations on pansharpening datasets demonstrate that the proposed method achieves superior outcomes compared to recent state-of-the-art~(SOTA) techniques, requiring only 15 sampling steps, which reduces over $90\%$ step compared with the benchmark diffusion models. Our experiments also include thorough discussions and ablation studies to underscore the effectiveness of our approach.

ResPanDiff: Diffusion Model for Pansharpening by Inferring Residual Inference

TL;DR

ResPanDiff tackles slow diffusion-based pansharpening by learning the residual between and through a dedicated diffusion process that starts from a noisy residual close to the LRMS distribution. It introduces a latent space, Shallow Cond-Injection, and a residual-focused loss to guide the diffusion toward accurate residual generation, enabling an efficient Markov chain that preserves fusion quality. Empirical results on WV3, GF2, and QB demonstrate state-of-the-art performance using as few as 15 sampling steps, delivering substantial speedups without sacrificing accuracy. The approach highlights the practicality of residual-diffusion, conditional latent features, and tailored losses for high-quality, fast pansharpening in multi-source image fusion.

Abstract

The implementation of diffusion-based pansharpening task is predominantly constrained by its slow inference speed, which results from numerous sampling steps. Despite the existing techniques aiming to accelerate sampling, they often compromise performance when fusing multi-source images. To ease this limitation, we introduce a novel and efficient diffusion model named Diffusion Model for Pansharpening by Inferring Residual Inference (ResPanDiff), which significantly reduces the number of diffusion steps without sacrificing the performance to tackle pansharpening task. In ResPanDiff, we innovatively propose a Markov chain that transits from noisy residuals to the residuals between the LRMS and HRMS images, thereby reducing the number of sampling steps and enhancing performance. Additionally, we design the latent space to help model extract more features at the encoding stage, Shallow Cond-Injection~(SC-I) to help model fetch cond-injected hidden features with higher dimensions, and loss functions to give a better guidance for the residual generation task. enabling the model to achieve superior performance in residual generation. Furthermore, experimental evaluations on pansharpening datasets demonstrate that the proposed method achieves superior outcomes compared to recent state-of-the-art~(SOTA) techniques, requiring only 15 sampling steps, which reduces over step compared with the benchmark diffusion models. Our experiments also include thorough discussions and ablation studies to underscore the effectiveness of our approach.
Paper Structure (19 sections, 24 equations, 10 figures, 6 tables)

This paper contains 19 sections, 24 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: (a) Pansharpening involves fusing the PAN and LRMS images into an HRMS image. (b) The process of the current denoising diffusion model utilized in pansharpening. The $q_(x_t|x_{t-1})$, $p_\theta(x_{t-1}|x_t,c)$, and $c$ represent the noise-adding forward process, the denoising backward process, and the condition, respectively.
  • Figure 2: The score of Frechet Inception Distance (FID heusel2018ganstrainedtimescaleupdate), on the CIFAR10 dataset comes from DDIM ddim, where $\eta$ is a hyperparameter that is directly controlled (it includes an original DDPM generative process when $\eta$ = 1 and DDIM when $\eta$ = 0) and $T$ represents the total timesteps. We can find that the more timesteps DDIM reduces, the worse its performance becomes.
  • Figure 3: The different lines the transport distribution $X_0$ to $X_1$ follow. The blue one means the original transmission, and the green one represents the simulation line. It is clear that the model takes more times with a smaller step to simulate the curve, while it may perform worse compared to the simulation of the straight line with even only one time and a bigger step.
  • Figure 4: The utility of Rectified flow. We can find that the first Rectified flow makes the trajectory non-crossed, and the second Rectified flow makes it straight.
  • Figure 10: Illustration of the noise schedule for ResPanDiff. We dynamically adjust the parameter of $\alpha$ from $8e-3$ to $8e-1$. These images are obtained in timesteps of 1, 5, 9 and 14, demonstrating the effect of different values of $p$ while keeping $\kappa = 1$ and $T = 15$ fixed.
  • ...and 5 more figures