Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal
Yuyang Hu, Suhas Lohit, Ulugbek S. Kamilov, Tim K. Marks
TL;DR
This work addresses cloud removal in optical satellite imagery by formulating it as a diffusion-bridge problem conditioned on aligned SAR data. The core approach, DB-CR, introduces a multimodal diffusion bridge with a two-branch SAR–optical backbone and cross-modal attention to fuse structural SAR information with spectral optical details, enabling stable, high-fidelity restoration. Experimental results on SEN12MS-CR demonstrate state-of-the-art distortion and perceptual quality with competitive computational efficiency, and ablations highlight the importance of the diffusion-bridge training and fusion components. The proposed diffusion-bridge framework and multimodal fusion strategy offer a practical, robust solution for cloud removal with controllable inference and strong potential for deployment in remote sensing workflows.
Abstract
Deep learning has achieved some success in addressing the challenge of cloud removal in optical satellite images, by fusing with synthetic aperture radar (SAR) images. Recently, diffusion models have emerged as powerful tools for cloud removal, delivering higher-quality estimation by sampling from cloud-free distributions, compared to earlier methods. However, diffusion models initiate sampling from pure Gaussian noise, which complicates the sampling trajectory and results in suboptimal performance. Also, current methods fall short in effectively fusing SAR and optical data. To address these limitations, we propose Diffusion Bridges for Cloud Removal, DB-CR, which directly bridges between the cloudy and cloud-free image distributions. In addition, we propose a novel multimodal diffusion bridge architecture with a two-branch backbone for multimodal image restoration, incorporating an efficient backbone and dedicated cross-modality fusion blocks to effectively extract and fuse features from synthetic aperture radar (SAR) and optical images. By formulating cloud removal as a diffusion-bridge problem and leveraging this tailored architecture, DB-CR achieves high-fidelity results while being computationally efficient. We evaluated DB-CR on the SEN12MS-CR cloud-removal dataset, demonstrating that it achieves state-of-the-art results.
