Table of Contents
Fetching ...

Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery

Jialu Sui, Yiyang Ma, Wenhan Yang, Xiaokang Zhang, Man-On Pun, Jiaying Liu

TL;DR

This work tackles cloud removal in ultra-resolution remote sensing imagery by introducing Diffusion Enhancement (DE), a diffusion-based framework that leverages a reference visual prior and a Weight Allocation network to fuse global structure with fine textures. The method is trained with a coarse-to-fine strategy and evaluated on the newly released CUHK-CR dataset (0.5 m, four multispectral bands) as well as the RICE dataset, showing superior perceptual quality (LPIPS) and signal fidelity (PSNR/SSIM) over prior DL-based CR models. DE's dynamic fusion across diffusion steps and the dedicated WA module enable robust reconstruction under both thin and thick cloud covers, while the ultra-resolution CUHK-CR benchmark addresses a critical data gap in CR research. Overall, the approach advances cloud-free restoration at ultra-high spatial resolution with practical training efficiency and competitive computational cost, suggesting strong applicability to high-fidelity RS analysis in cloud-obscured scenes.

Abstract

The presence of cloud layers severely compromises the quality and effectiveness of optical remote sensing (RS) images. However, existing deep-learning (DL)-based Cloud Removal (CR) techniques encounter difficulties in accurately reconstructing the original visual authenticity and detailed semantic content of the images. To tackle this challenge, this work proposes to encompass enhancements at the data and methodology fronts. On the data side, an ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) of 0.5m spatial resolution is established. This benchmark incorporates rich detailed textures and diverse cloud coverage, serving as a robust foundation for designing and assessing CR models. From the methodology perspective, a novel diffusion-based framework for CR called Diffusion Enhancement (DE) is proposed to perform progressive texture detail recovery, which mitigates the training difficulty with improved inference accuracy. Additionally, a Weight Allocation (WA) network is developed to dynamically adjust the weights for feature fusion, thereby further improving performance, particularly in the context of ultra-resolution image generation. Furthermore, a coarse-to-fine training strategy is applied to effectively expedite training convergence while reducing the computational complexity required to handle ultra-resolution images. Extensive experiments on the newly established CUHK-CR and existing datasets such as RICE confirm that the proposed DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity.

Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery

TL;DR

This work tackles cloud removal in ultra-resolution remote sensing imagery by introducing Diffusion Enhancement (DE), a diffusion-based framework that leverages a reference visual prior and a Weight Allocation network to fuse global structure with fine textures. The method is trained with a coarse-to-fine strategy and evaluated on the newly released CUHK-CR dataset (0.5 m, four multispectral bands) as well as the RICE dataset, showing superior perceptual quality (LPIPS) and signal fidelity (PSNR/SSIM) over prior DL-based CR models. DE's dynamic fusion across diffusion steps and the dedicated WA module enable robust reconstruction under both thin and thick cloud covers, while the ultra-resolution CUHK-CR benchmark addresses a critical data gap in CR research. Overall, the approach advances cloud-free restoration at ultra-high spatial resolution with practical training efficiency and competitive computational cost, suggesting strong applicability to high-fidelity RS analysis in cloud-obscured scenes.

Abstract

The presence of cloud layers severely compromises the quality and effectiveness of optical remote sensing (RS) images. However, existing deep-learning (DL)-based Cloud Removal (CR) techniques encounter difficulties in accurately reconstructing the original visual authenticity and detailed semantic content of the images. To tackle this challenge, this work proposes to encompass enhancements at the data and methodology fronts. On the data side, an ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) of 0.5m spatial resolution is established. This benchmark incorporates rich detailed textures and diverse cloud coverage, serving as a robust foundation for designing and assessing CR models. From the methodology perspective, a novel diffusion-based framework for CR called Diffusion Enhancement (DE) is proposed to perform progressive texture detail recovery, which mitigates the training difficulty with improved inference accuracy. Additionally, a Weight Allocation (WA) network is developed to dynamically adjust the weights for feature fusion, thereby further improving performance, particularly in the context of ultra-resolution image generation. Furthermore, a coarse-to-fine training strategy is applied to effectively expedite training convergence while reducing the computational complexity required to handle ultra-resolution images. Extensive experiments on the newly established CUHK-CR and existing datasets such as RICE confirm that the proposed DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity.
Paper Structure (28 sections, 15 equations, 11 figures, 10 tables)

This paper contains 28 sections, 15 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: The distribution of images on different CCP of CUHK-CR1 training and test dataset computed via the detector of Cloud-Net clouddetection1. The average probability of cloud coverage is 50.7%.
  • Figure 2: The distribution of images on different CCP of CUHK-CR2 training and test dataset computed via the detector of Cloud-Net clouddetection1. The average probability of cloud coverage is 42.5%.
  • Figure 3: The architecture of our DE for CR. Diffusion branch in (a) performs the diffusion step that removes noise progressively, which is capable of restoring fine-grained textures. Weighting branch in (b) performs the dynamic fusion of results from both the reference and diffusion branches with the result $\mathbf{x}_{0,t}$, capturing the merits of both excellent global estimations and fine details. Reference branch in (c) generates a cloud-free image based on the cloudy image $\mathbf{y}$, offering substantial global context. Ultimately, $\mathbf{x}_{0,t}$ and $\mathbf{x}_t$ are utilized in the generation of $\mathbf{x}_{t-1}$.
  • Figure 4: The style of $\mathbf{x}_{0, t}$ from denoising time-step $T$ to $0$. The first line and second line represent the result of the vanilla diffusion model and our DE, respectively. The cloud-free and cloudy images are presented on the left side.
  • Figure 5: The architecture of Weight Allocation (WA). WA learns to dynamically determine the weighting matrix based on the image features and the noise strength.
  • ...and 6 more figures