Table of Contents
Fetching ...

SAR-to-RGB Translation with Latent Diffusion for Earth Observation

Kaan Aydin, Joelle Hanna, Damian Borth

TL;DR

This work tackles missing RGB data in Earth observation by translating SAR imagery to RGB using a latent-space diffusion framework. It employs a ViT-based diffusion transformer within a VAE latent space, exploring Standard Diffusion with and without class conditioning as well as Cold Diffusion to synthesize RGB images from S1 inputs. The generated RGB data are evaluated on land cover classification and cloud removal, revealing that class conditioning improves classification while Cold Diffusion preserves structure yet may have lower perceptual quality; cloud removal performance is competitive even though not explicitly optimized. The results demonstrate the practical potential of diffusion-based SAR-to-RGB translation to support RS tasks when RGB data are unavailable, and point to future enhancements in SAR conditioning and multi-spectral extensions.

Abstract

Earth observation satellites like Sentinel-1 (S1) and Sentinel-2 (S2) provide complementary remote sensing (RS) data, but S2 images are often unavailable due to cloud cover or data gaps. To address this, we propose a diffusion model (DM)-based approach for SAR-to-RGB translation, generating synthetic optical images from SAR inputs. We explore three different setups: two using Standard Diffusion, which reconstruct S2 images by adding and removing noise (one without and one with class conditioning), and one using Cold Diffusion, which blends S2 with S1 before removing the SAR signal. We evaluate the generated images in downstream tasks, including land cover classification and cloud removal. While generated images may not perfectly replicate real S2 data, they still provide valuable information. Our results show that class conditioning improves classification accuracy, while cloud removal performance remains competitive despite our approach not being optimized for it. Interestingly, despite exhibiting lower perceptual quality, the Cold Diffusion setup performs well in land cover classification, suggesting that traditional quantitative evaluation metrics may not fully reflect the practical utility of generated images. Our findings highlight the potential of DMs for SAR-to-RGB translation in RS applications where RGB images are missing.

SAR-to-RGB Translation with Latent Diffusion for Earth Observation

TL;DR

This work tackles missing RGB data in Earth observation by translating SAR imagery to RGB using a latent-space diffusion framework. It employs a ViT-based diffusion transformer within a VAE latent space, exploring Standard Diffusion with and without class conditioning as well as Cold Diffusion to synthesize RGB images from S1 inputs. The generated RGB data are evaluated on land cover classification and cloud removal, revealing that class conditioning improves classification while Cold Diffusion preserves structure yet may have lower perceptual quality; cloud removal performance is competitive even though not explicitly optimized. The results demonstrate the practical potential of diffusion-based SAR-to-RGB translation to support RS tasks when RGB data are unavailable, and point to future enhancements in SAR conditioning and multi-spectral extensions.

Abstract

Earth observation satellites like Sentinel-1 (S1) and Sentinel-2 (S2) provide complementary remote sensing (RS) data, but S2 images are often unavailable due to cloud cover or data gaps. To address this, we propose a diffusion model (DM)-based approach for SAR-to-RGB translation, generating synthetic optical images from SAR inputs. We explore three different setups: two using Standard Diffusion, which reconstruct S2 images by adding and removing noise (one without and one with class conditioning), and one using Cold Diffusion, which blends S2 with S1 before removing the SAR signal. We evaluate the generated images in downstream tasks, including land cover classification and cloud removal. While generated images may not perfectly replicate real S2 data, they still provide valuable information. Our results show that class conditioning improves classification accuracy, while cloud removal performance remains competitive despite our approach not being optimized for it. Interestingly, despite exhibiting lower perceptual quality, the Cold Diffusion setup performs well in land cover classification, suggesting that traditional quantitative evaluation metrics may not fully reflect the practical utility of generated images. Our findings highlight the potential of DMs for SAR-to-RGB translation in RS applications where RGB images are missing.

Paper Structure

This paper contains 26 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Schematic overview of our standard diffusion methodology. In the first step, Diffusion Training & Sampling, we train our diffusion models that take a SAR image as input and predict the corresponding RGB output. Once training is complete, we freeze the DMs and use them to generate synthetic RGB images (Generated S2). In the next step (2a and 2b), these generated samples are then used for evaluating our downstream applications, specficially Land Cover Classification and Cloud Removal. The details of our setup are discussed in the following sections.
  • Figure 2: Visualisation of the generation process for Standard Diffusion (top row) and Cold Diffusion (bottom row)
  • Figure 3: Qualitative comparison of generated S2 images across our experimental setups, presented alongside the original S1 SAR images and corresponding real S2 RGB images. Each column represents a different example, illustrating variations in image translation quality across experiments.