SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening
Junjie Li, Congyang Ou, Haokui Zhang, Guoting Wei, Shengqin Jiang, Ying Li, Chunhua Shen
TL;DR
SALAD-Pan addresses cross-sensor pansharpening by performing diffusion in a latent space learned with a band-wise single-channel VAE, enabling sensor-agnostic processing across varying MS band configurations. It couples PAN-driven spatial guidance and upsampled LRMS-driven spectral guidance through bidirectional encoder interactions and frequency-split fusion, augmented with sensor-aware text prompts and a lightweight cross-band attention module. The method delivers state-of-the-art results on PanCollection sensors GF2, QB, and WV3, while achieving 2–3× faster inference and robust zero-shot transfer to WV2. These contributions demonstrate that latent-space diffusion, together with disentangled conditioning and cross-band coherence, provides a practical, scalable solution for high-fidelity pan-sharpening in multi-sensor remote sensing pipelines.
Abstract
Recently, diffusion models bring novel insights for Pan-sharpening and notably boost fusion precision. However, most existing models perform diffusion in the pixel space and train distinct models for different multispectral (MS) imagery, suffering from high latency and sensor-specific limitations. In this paper, we present SALAD-Pan, a sensor-agnostic latent space diffusion method for efficient pansharpening. Specifically, SALAD-Pan trains a band-wise single-channel VAE to encode high-resolution multispectral (HRMS) into compact latent representations, supporting MS images with various channel counts and establishing a basis for acceleration. Then spectral physical properties, along with PAN and MS images, are injected into the diffusion backbone through unidirectional and bidirectional interactive control structures respectively, achieving high-precision fusion in the diffusion process. Finally, a lightweight cross-spectral attention module is added to the central layer of diffusion model, reinforcing spectral connections to boost spectral consistency and further elevate fusion precision. Experimental results on GaoFen-2, QuickBird, and WorldView-3 demonstrate that SALAD-Pan outperforms state-of-the-art diffusion-based methods across all three datasets, attains a 2-3x inference speedup, and exhibits robust zero-shot (cross-sensor) capability.
