DiffFuSR: Super-Resolution of all Sentinel-2 Multispectral Bands using Diffusion Models
Muhammad Sarmad, Arnt-Børre Salberg, Michael Kampffmeyer
TL;DR
DiffFuSR tackles the challenge of upsampling all Sentinel-2 bands to a common 2.5 m resolution by a two-stage approach: a diffusion-model–based SR for the 10 m RGB bands trained on harmonized cross-domain data, followed by a learned fusion network that upscales the remaining 10/20/60 m bands using the super-resolved RGB as a spatial prior. The method introduces a conditional DDPM with spatial and degradation encoders to enable blind SR and uses a contrastive degradation model to simulate Sentinel-2-like degradations; a Wald-protocol–based self-supervised fusion further preserves spectral fidelity while injecting high-frequency spatial details. Quantitative and qualitative evaluations on OpenSR-test and multiple datasets show that DiffFuSR outperforms baselines in reflectance fidelity, spectral consistency, and hallucination suppression, while the fusion stage delivers competitive improvements for multi-band upscaling. The work demonstrates a practical, modular path to generate a 12-band, 2.5 m Sentinel-2 product from public data, with implications for detailed land monitoring and open EO research.
Abstract
This paper presents DiffFuSR, a modular pipeline for super-resolving all 12 spectral bands of Sentinel-2 Level-2A imagery to a unified ground sampling distance (GSD) of 2.5 meters. The pipeline comprises two stages: (i) a diffusion-based super-resolution (SR) model trained on high-resolution RGB imagery from the NAIP and WorldStrat datasets, harmonized to simulate Sentinel-2 characteristics; and (ii) a learned fusion network that upscales the remaining multispectral bands using the super-resolved RGB image as a spatial prior. We introduce a robust degradation model and contrastive degradation encoder to support blind SR. Extensive evaluations of the proposed SR pipeline on the OpenSR benchmark demonstrate that the proposed method outperforms current SOTA baselines in terms of reflectance fidelity, spectral consistency, spatial alignment, and hallucination suppression. Furthermore, the fusion network significantly outperforms classical and learned pansharpening approaches, enabling accurate enhancement of Sentinel-2's 20 m and 60 m bands. This work proposes a novel modular framework Sentinel-2 SR that utilizes harmonized learning with diffusion models and fusion strategies. Our code and models can be found at https://github.com/NorskRegnesentral/DiffFuSR.
