Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications
João Gabriel Vinholi, Marco Chini, Anis Amziane, Renato Machado, Danilo Silva, Patrick Matgen
TL;DR
The paper tackles the problem of translating large-scale optical imagery across heterogeneous sensors by introducing a DDIM-based diffusion framework that ensures patch-wise consistency and radiometric fidelity. It introduces novel forward diffusion procedures with whitening and coloring, plus a PSNR voting scheme during inference to select consistent high-quality patches, enabling effective super-resolution from LR to HR across hundreds of patches. Empirical results on Sentinel-II to Planet Dove data show state-of-the-art perceptual and distributional metrics (e.g., mLPIPS ≈ 0.1884 and FID ≈ 45.64) and demonstrate practical benefits in heterogeneous change detection across Beirut and Austin, including substantial false-alarm reductions. The method’s integration of large-scale domain adaptation with super-resolution, plus its analysis of hyperparameters and robustness, highlights its potential for real-world remote-sensing applications while acknowledging computational costs and challenges when target-domain radiometry is unavailable.
Abstract
Comparing images captured by disparate sensors is a common challenge in remote sensing. This requires image translation -- converting imagery from one sensor domain to another while preserving the original content. Denoising Diffusion Implicit Models (DDIM) are potential state-of-the-art solutions for such domain translation due to their proven superiority in multiple image-to-image translation tasks in computer vision. However, these models struggle with reproducing radiometric features of large-scale multi-patch imagery, resulting in inconsistencies across the full image. This renders downstream tasks like Heterogeneous Change Detection impractical. To overcome these limitations, we propose a method that leverages denoising diffusion for effective multi-sensor optical image translation over large areas. Our approach super-resolves large-scale low spatial resolution images into high-resolution equivalents from disparate optical sensors, ensuring uniformity across hundreds of patches. Our contributions lie in new forward and reverse diffusion processes that address the challenges of large-scale image translation. Extensive experiments using paired Sentinel-II (10m) and Planet Dove (3m) images demonstrate that our approach provides precise domain adaptation, preserving image content while improving radiometric accuracy and feature representation. A thorough image quality assessment and comparisons with the standard DDIM framework and five other leading methods are presented. We reach a mean Learned Perceptual Image Patch Similarity (mLPIPS) of 0.1884 and a Fréchet Inception Distance (FID) of 45.64, expressively outperforming all compared methods, including DDIM, ShuffleMixer, and SwinIR. The usefulness of our approach is further demonstrated in two Heterogeneous Change Detection tasks.
