Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

João Gabriel Vinholi; Marco Chini; Anis Amziane; Renato Machado; Danilo Silva; Patrick Matgen

Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

João Gabriel Vinholi, Marco Chini, Anis Amziane, Renato Machado, Danilo Silva, Patrick Matgen

TL;DR

The paper tackles the problem of translating large-scale optical imagery across heterogeneous sensors by introducing a DDIM-based diffusion framework that ensures patch-wise consistency and radiometric fidelity. It introduces novel forward diffusion procedures with whitening and coloring, plus a PSNR voting scheme during inference to select consistent high-quality patches, enabling effective super-resolution from LR to HR across hundreds of patches. Empirical results on Sentinel-II to Planet Dove data show state-of-the-art perceptual and distributional metrics (e.g., mLPIPS ≈ 0.1884 and FID ≈ 45.64) and demonstrate practical benefits in heterogeneous change detection across Beirut and Austin, including substantial false-alarm reductions. The method’s integration of large-scale domain adaptation with super-resolution, plus its analysis of hyperparameters and robustness, highlights its potential for real-world remote-sensing applications while acknowledging computational costs and challenges when target-domain radiometry is unavailable.

Abstract

Comparing images captured by disparate sensors is a common challenge in remote sensing. This requires image translation -- converting imagery from one sensor domain to another while preserving the original content. Denoising Diffusion Implicit Models (DDIM) are potential state-of-the-art solutions for such domain translation due to their proven superiority in multiple image-to-image translation tasks in computer vision. However, these models struggle with reproducing radiometric features of large-scale multi-patch imagery, resulting in inconsistencies across the full image. This renders downstream tasks like Heterogeneous Change Detection impractical. To overcome these limitations, we propose a method that leverages denoising diffusion for effective multi-sensor optical image translation over large areas. Our approach super-resolves large-scale low spatial resolution images into high-resolution equivalents from disparate optical sensors, ensuring uniformity across hundreds of patches. Our contributions lie in new forward and reverse diffusion processes that address the challenges of large-scale image translation. Extensive experiments using paired Sentinel-II (10m) and Planet Dove (3m) images demonstrate that our approach provides precise domain adaptation, preserving image content while improving radiometric accuracy and feature representation. A thorough image quality assessment and comparisons with the standard DDIM framework and five other leading methods are presented. We reach a mean Learned Perceptual Image Patch Similarity (mLPIPS) of 0.1884 and a Fréchet Inception Distance (FID) of 45.64, expressively outperforming all compared methods, including DDIM, ShuffleMixer, and SwinIR. The usefulness of our approach is further demonstrated in two Heterogeneous Change Detection tasks.

Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

TL;DR

Abstract

Paper Structure (33 sections, 12 equations, 11 figures, 4 tables, 5 algorithms)

This paper contains 33 sections, 12 equations, 11 figures, 4 tables, 5 algorithms.

Introduction
Related Works
Denoising Diffusion Models
Image-to-Image Translation
Proposed Method
Training
Inference
Dataset
Experiments
Image Quality Metrics and Ablation Experiments
Comparison Metrics
Inference Hyperparameters
Visual Examples
Change Detection as a Use Case
Rationale for Selective Comparisons
...and 18 more sections

Figures (11)

Figure 1: Surface reflectance imagery from Beirut, Lebanon, of low and high spatial resolutions, captured at similar time intervals, from Sentinel-II (10m) and Planet Dove (3m) sensors. Figures \ref{['fig:pre_s2']} and \ref{['fig:post_s2']} are the pre- and post-event images from Sentinel-II, respectively. Figures \ref{['fig:pre_planet']} and \ref{['fig:post_planet']} are the pre- and post-event images from Planet Dove, respectively.
Figure 2: Diagram illustrating the training step procedure for the proposed DDM-based image-to-image translation method.
Figure 3: Diagram illustrating the inference procedure for the proposed DDM-based image-to-image translation method. Here, the color information variables $\bm{m}_1$, $m_2$, $m_3$ are provided from an external image, e.g., a post-event Planet Dove image.
Figure 4: Comparison among images generated by the tested I2I methods (c)-(l) and the original Sentinel-II (a) and Planet Dove (b) images. The image generated with the proposed model (c) displays higher resolution features not observable among the others, while maintaining patch-wise feature consistency and avoiding blurriness.
Figure 5: Figures of the region of interest in Tlaquepaque, Mexico. In the first row, images of a far view of the city are displayed. In the second row, we can observe a cropped part of the images in the first row for closer inspection. At the third row, a cropped part of the images from the second row is displayed, for the inspection of meter-scale features. The first, second, and third columns contain images from Planet Dove, Sentinel-II, and Synthetic Planet domains, respectively.
...and 6 more figures

Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

TL;DR

Abstract

Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (11)