Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan, Shuigeng Zhou, Yichao Zhang
TL;DR
The paper addresses cloud removal in remote sensing by introducing EMRDM, an improved mean-reverting diffusion model that starts diffusion from cloudy inputs via a forward $SDE$ and reconstructs cloudless images with an $ODE$-based backward process. It offers a modular, elucidated design space by reformulating the forward process and redefining the denoiser through a preconditioning framework, enabling independent module improvements and compatibility with generative diffusion methods. A novel multi-temporal denoising network denises sequential cloudy images in parallel using temporal fusion attention, enhancing restoration across time. Comprehensive experiments on mono-temporal and multi-temporal datasets show EMRDM achieving state-of-the-art performance, validating the framework’s effectiveness and practicality for high-fidelity CR in diverse remote-sensing scenarios. The work provides code for reproducibility and demonstrates strong potential for deployment in real-time CR tasks.
Abstract
Cloud removal (CR) remains a challenging task in remote sensing image processing. Although diffusion models (DM) exhibit strong generative capabilities, their direct applications to CR are suboptimal, as they generate cloudless images from random noise, ignoring inherent information in cloudy inputs. To overcome this drawback, we develop a new CR model EMRDM based on mean-reverting diffusion models (MRDMs) to establish a direct diffusion process between cloudy and cloudless images. Compared to current MRDMs, EMRDM offers a modular framework with updatable modules and an elucidated design space, based on a reformulated forward process and a new ordinary differential equation (ODE)-based backward process. Leveraging our framework, we redesign key MRDM modules to boost CR performance, including restructuring the denoiser via a preconditioning technique, reorganizing the training process, and improving the sampling process by introducing deterministic and stochastic samplers. To achieve multi-temporal CR, we further develop a denoising network for simultaneously denoising sequential images. Experiments on mono-temporal and multi-temporal datasets demonstrate the superior performance of EMRDM. Our code is available at https://github.com/Ly403/EMRDM.
