Table of Contents
Fetching ...

Multi-Modal and Multi-Resolution Data Fusion for High-Resolution Cloud Removal: A Novel Baseline and Benchmark

Fang Xu, Yilei Shi, Patrick Ebel, Wen Yang, Xiao Xiang Zhu

TL;DR

A new baseline named Align-CR is designed to perform the low-resolution synthetic aperture radar (SAR) image-guided high-resolution optical image CR, which gradually warps and fuses the features of the multimodal and multiresolution data during the reconstruction process, effectively mitigating concerns associated with misalignment.

Abstract

Cloud removal is a significant and challenging problem in remote sensing, and in recent years, there have been notable advancements in this area. However, two major issues remain hindering the development of cloud removal: the unavailability of high-resolution imagery for existing datasets and the absence of evaluation regarding the semantic meaningfulness of the generated structures. In this paper, we introduce M3R-CR, a benchmark dataset for high-resolution Cloud Removal with Multi-Modal and Multi-Resolution data fusion. With this dataset, we consider the problem of cloud removal in high-resolution optical remote sensing imagery by integrating multi-modal and multi-resolution information. In this context, we have to take into account the alignment errors caused by the multi-resolution nature, along with the more pronounced misalignment issues in high-resolution images due to inherent imaging mechanism differences and other factors. Existing multi-modal data fusion based methods, which assume the image pairs are aligned accurately at pixel-level, are thus not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution SAR image guided high-resolution optical image cloud removal. It gradually warps and fuses the features of the multi-modal and multi-resolution data during the reconstruction process, effectively mitigating concerns associated with misalignment. In the experiments, we evaluate the performance of cloud removal by analyzing the quality of visually pleasing textures using image reconstruction metrics and further analyze the generation of semantically meaningful structures using a well-established semantic segmentation task. The proposed Align-CR method is superior to other baseline methods in both areas.

Multi-Modal and Multi-Resolution Data Fusion for High-Resolution Cloud Removal: A Novel Baseline and Benchmark

TL;DR

A new baseline named Align-CR is designed to perform the low-resolution synthetic aperture radar (SAR) image-guided high-resolution optical image CR, which gradually warps and fuses the features of the multimodal and multiresolution data during the reconstruction process, effectively mitigating concerns associated with misalignment.

Abstract

Cloud removal is a significant and challenging problem in remote sensing, and in recent years, there have been notable advancements in this area. However, two major issues remain hindering the development of cloud removal: the unavailability of high-resolution imagery for existing datasets and the absence of evaluation regarding the semantic meaningfulness of the generated structures. In this paper, we introduce M3R-CR, a benchmark dataset for high-resolution Cloud Removal with Multi-Modal and Multi-Resolution data fusion. With this dataset, we consider the problem of cloud removal in high-resolution optical remote sensing imagery by integrating multi-modal and multi-resolution information. In this context, we have to take into account the alignment errors caused by the multi-resolution nature, along with the more pronounced misalignment issues in high-resolution images due to inherent imaging mechanism differences and other factors. Existing multi-modal data fusion based methods, which assume the image pairs are aligned accurately at pixel-level, are thus not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution SAR image guided high-resolution optical image cloud removal. It gradually warps and fuses the features of the multi-modal and multi-resolution data during the reconstruction process, effectively mitigating concerns associated with misalignment. In the experiments, we evaluate the performance of cloud removal by analyzing the quality of visually pleasing textures using image reconstruction metrics and further analyze the generation of semantically meaningful structures using a well-established semantic segmentation task. The proposed Align-CR method is superior to other baseline methods in both areas.
Paper Structure (19 sections, 7 equations, 9 figures, 5 tables)

This paper contains 19 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Visualization of the M3R-CR dataset. (a) Spatial distribution of the globally sampled 780 areas of interest. (b) Cloud-free optical observations from PlanetScope. (c) Cloudy optical observations from PlanetScope. (d) SAR observations from Sentinel-1 (visualized with the VV polarization mode). (e) Land cover maps from WorldCover. They are scaled to the same size for better viewing.
  • Figure 2: Statistics of the M3R-CR dataset. (a) Distribution of cloud coverage. (b) Distribution of land cover types.
  • Figure 3: Overview of the proposed Align-CR method. An upsampling operator is first employed to map the SAR image to the same resolution as the optical image. Then, the cloudy optical image and the upsampled SAR image are passed through their respective feature extraction (FE) blocks to extract modality-specific features. After that, the features are fed into $D$ AlignFuse blocks to obtain knowledgeable features with comprehensive information. The AlignFuse block performs alignment and fusion sequentially in the feature space. Finally, the outputs of all AlignFuse blocks are concatenated and fed to the image reconstruction (IR) block to restore the high-quality cloud-free image.
  • Figure 4: Qualitative results of visual recovery quality for 8 different samples. The first row shows the cloudy images, the second row shows the SAR images, the third to seventh rows show the results from the DSen2-CR, GLF-CR, w/o SAR, w/o Align and Align-CR models, and the eighth row shows the cloud-free images.
  • Figure 5: Quantitative results of visual recovery quality over different cloud cover levels in terms of the MAE, PSNR, SAM, and SSIM quality metrics.
  • ...and 4 more figures