SAR-to-RGB Translation with Latent Diffusion for Earth Observation
Kaan Aydin, Joelle Hanna, Damian Borth
TL;DR
This work tackles missing RGB data in Earth observation by translating SAR imagery to RGB using a latent-space diffusion framework. It employs a ViT-based diffusion transformer within a VAE latent space, exploring Standard Diffusion with and without class conditioning as well as Cold Diffusion to synthesize RGB images from S1 inputs. The generated RGB data are evaluated on land cover classification and cloud removal, revealing that class conditioning improves classification while Cold Diffusion preserves structure yet may have lower perceptual quality; cloud removal performance is competitive even though not explicitly optimized. The results demonstrate the practical potential of diffusion-based SAR-to-RGB translation to support RS tasks when RGB data are unavailable, and point to future enhancements in SAR conditioning and multi-spectral extensions.
Abstract
Earth observation satellites like Sentinel-1 (S1) and Sentinel-2 (S2) provide complementary remote sensing (RS) data, but S2 images are often unavailable due to cloud cover or data gaps. To address this, we propose a diffusion model (DM)-based approach for SAR-to-RGB translation, generating synthetic optical images from SAR inputs. We explore three different setups: two using Standard Diffusion, which reconstruct S2 images by adding and removing noise (one without and one with class conditioning), and one using Cold Diffusion, which blends S2 with S1 before removing the SAR signal. We evaluate the generated images in downstream tasks, including land cover classification and cloud removal. While generated images may not perfectly replicate real S2 data, they still provide valuable information. Our results show that class conditioning improves classification accuracy, while cloud removal performance remains competitive despite our approach not being optimized for it. Interestingly, despite exhibiting lower perceptual quality, the Cold Diffusion setup performs well in land cover classification, suggesting that traditional quantitative evaluation metrics may not fully reflect the practical utility of generated images. Our findings highlight the potential of DMs for SAR-to-RGB translation in RS applications where RGB images are missing.
