Fose: Fusion of One-Step Diffusion and End-to-End Network for Pansharpening
Kai Liu, Zeli Lin, Weibo Wang, Linghe Kong, Yulun Zhang
TL;DR
Fose tackles pansharpening by blending a one-step diffusion model with a compact end-to-end network through a four-stage training pipeline. By distilling a multi-step diffusion baseline into a single step and fusing it with an E2E model via a lightweight adaptor, it achieves a 7.42x speedup while attaining state-of-the-art accuracy on WV3, GF2, and QB. Comprehensive experiments and ablations confirm robust gains across reduced- and full-resolution metrics and highlight the value of adaptive convolution, distillation, and fusion strategies. This work demonstrates the practical viability of diffusion priors in efficient, high-fidelity remote sensing image fusion.
Abstract
Pansharpening is a significant image fusion task that fuses low-resolution multispectral images (LRMSI) and high-resolution panchromatic images (PAN) to obtain high-resolution multispectral images (HRMSI). The development of the diffusion models (DM) and the end-to-end models (E2E model) has greatly improved the frontier of pansharping. DM takes the multi-step diffusion to obtain an accurate estimation of the residual between LRMSI and HRMSI. However, the multi-step process takes large computational power and is time-consuming. As for E2E models, their performance is still limited by the lack of prior and simple structure. In this paper, we propose a novel four-stage training strategy to obtain a lightweight network Fose, which fuses one-step DM and an E2E model. We perform one-step distillation on an enhanced SOTA DM for pansharping to compress the inference process from 50 steps to only 1 step. Then we fuse the E2E model with one-step DM with lightweight ensemble blocks. Comprehensive experiments are conducted to demonstrate the significant improvement of the proposed Fose on three commonly used benchmarks. Moreover, we achieve a 7.42 speedup ratio compared to the baseline DM while achieving much better performance. The code and model are released at https://github.com/Kai-Liu001/Fose.
