Fast Kernel-Space Diffusion for Remote Sensing Pansharpening
Hancong Jin, Zihan Cao, Liang-jian Deng, Jingjing Li
TL;DR
This work tackles pansharpening by combining the strengths of diffusion models with fast CNN-based regression. It introduces KSDiff, which generates diffusion-informed convolutional kernels in latent space using a kernel generator guided by a latent diffusion prior, enabling global context integration without the latency of full-pixel diffusion. A two-stage training protocol is paired with a Pyramid Latent Fusion Encoder to fuse PAN, LRMS, and HRMS priors, and a structure-aware multi-head attention mechanism governs kernel modulation via a low-rank Tucker decomposition. Empirical results on WV3, GF2, and QB show competitive or superior fusion quality with inference speeds orders of magnitude faster than diffusion baselines. The approach generalizes across backbones and datasets, offering a practical, scalable solution for remote-sensing image fusion with strong spectral–spatial fidelity.
Abstract
Pansharpening seeks to fuse high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) images into a single image with both fine spatial and rich spectral detail. Despite progress in deep learning-based approaches, existing methods often fail to capture global priors inherent in remote sensing data distributions. Diffusion-based models have recently emerged as promising solutions due to their powerful distribution mapping capabilities, however, they suffer from heavy inference latency. We introduce KSDiff, a fast kernel-space diffusion framework that generates convolutional kernels enriched with global context to enhance pansharpening quality and accelerate inference. Specifically, KSDiff constructs these kernels through the integration of a low-rank core tensor generator and a unified factor generator, orchestrated by a structure-aware multi-head attention mechanism. We further introduce a two-stage training strategy tailored for pansharpening, facilitating integration into existing pansharpening architectures. Experiments show that KSDiff achieves superior performance compared to recent promising methods, and with over $500 \times$ faster inference than diffusion-based pansharpening baselines. Ablation studies, visualizations and further evaluations substantiate the effectiveness of our approach. Code will be released upon possible acceptance.
