3D Cloud reconstruction through geospatially-aware Masked Autoencoders

Stella Girtsou; Emiliano Diaz Salas-Porras; Lilli Freischem; Joppe Massant; Kyriaki-Margarita Bintsi; Guiseppe Castiglione; William Jones; Michael Eisinger; Emmanuel Johnson; Anna Jungbluth

3D Cloud reconstruction through geospatially-aware Masked Autoencoders

Stella Girtsou, Emiliano Diaz Salas-Porras, Lilli Freischem, Joppe Massant, Kyriaki-Margarita Bintsi, Guiseppe Castiglione, William Jones, Michael Eisinger, Emmanuel Johnson, Anna Jungbluth

TL;DR

The paper addresses the need for real-time 3D cloud data to reduce uncertainties in climate models by reconstructing 3D cloud structures from geostationary MSG/SEVIRI imagery. It proposes self-supervised pretraining using Masked Autoencoders (MAE) on unlabeled data and a geospatial SatMAE that encodes time and location, followed by fine-tuning on ~47k image–profile pairs aligned with CloudSat CPR radar profiles to produce 3D cloud volumes of size $90×256×256$. SatMAE with time and coordinate encodings yields the best RMSE, PSNR, and SSIM, especially in the tropical convection belt, outperforming a supervised U-Net baseline. This framework enables higher-fidelity, near-real-time 3D cloud products and can be extended to ESA EarthCARE data for long-term climate-relevant cloud datasets.

Abstract

Clouds play a key role in Earth's radiation balance with complex effects that introduce large uncertainties into climate models. Real-time 3D cloud data is essential for improving climate predictions. This study leverages geostationary imagery from MSG/SEVIRI and radar reflectivity measurements of cloud profiles from CloudSat/CPR to reconstruct 3D cloud structures. We first apply self-supervised learning (SSL) methods-Masked Autoencoders (MAE) and geospatially-aware SatMAE on unlabelled MSG images, and then fine-tune our models on matched image-profile pairs. Our approach outperforms state-of-the-art methods like U-Nets, and our geospatial encoding further improves prediction results, demonstrating the potential of SSL for cloud reconstruction.

3D Cloud reconstruction through geospatially-aware Masked Autoencoders

TL;DR

. SatMAE with time and coordinate encodings yields the best RMSE, PSNR, and SSIM, especially in the tropical convection belt, outperforming a supervised U-Net baseline. This framework enables higher-fidelity, near-real-time 3D cloud products and can be extended to ESA EarthCARE data for long-term climate-relevant cloud datasets.

Abstract

Paper Structure (6 sections, 8 figures, 4 tables)

This paper contains 6 sections, 8 figures, 4 tables.

Introduction
Data & Method
Results
Conclusions
Appendix
Further Training Details.

Figures (8)

Figure 1: (a) Proposed pipeline: We use MAE to pre-train encoders on unlabelled MSG/SEVIRI images by reconstructing missing information, then fine-tune these encoders with a smaller dataset of image-profile pairs to derive 3D cloud structures. (b) The ground track of 1 day of CloudSat orbits.
Figure 2: (a) Validation Loss for the different size MAEs. (b) Visualization of the masked, reconstructed and original images for MAEs based on 8- or 16-pixel tokenization.
Figure 3: Comparison of example MSG/SEVIRI input channels ($7.35~\mu\text{m}$) with CloudSat overpasses (red lines), and corresponding (normalized) radar reflectivity profiles. All models were trained for 50 epochs, which resulted in the highest perceptual quality.
Figure 4: Monthly and yearly root-mean-square errors (RMSE) across MSG's field-of-view (one value per image-profile pair), comparing our U-Net, MAE, and SatMAE models. SatMAE with time and coords consistently improves overall prediction performance, especially in the tropical convection belt. Note that we show the prediction errors across our entire dataset (i.e. including training, validation, and test examples).
Figure 5: Mean squared error (MSE) loss, peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) as a function of training steps for our U-Net baseline and MAE models. The MAE was either trained from scratch, or fine-tuned using the pre-trained encoder. For the latter, we either froze or further fine-tuned the encoder weights.
...and 3 more figures

3D Cloud reconstruction through geospatially-aware Masked Autoencoders

TL;DR

Abstract

3D Cloud reconstruction through geospatially-aware Masked Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (8)