CuMoLoS-MAE: A Masked Autoencoder for Remote Sensing Data Reconstruction
Anurup Naskar, Nathanael Zhixin Wong, Sara Shamekh
TL;DR
CuMoLoS-MAE tackles noisy remote-sensing atmospheric profiling by introducing a curriculum-guided masked autoencoder with Monte Carlo ensembling. It uses micro-patch MAE within a ViT framework to reconstruct fine-scale vertical velocity fields and to produce per-pixel uncertainty maps, with uncertainty estimated by averaging over $N$ random masks ($ar{X} = \frac{1}{N}\sum \hat{X}^{(i)}$, $\sigma_X = \sqrt{\frac{1}{N}\sum (\hat{X}^{(i)}-\bar{X})^2}$). The approach achieves state-of-the-art reconstruction quality and reliable uncertainty estimates on Doppler lidar data from ARM SGP, while revealing the trade-offs between temporal context and spectral fidelity. This enables improved convection diagnostics, real-time data assimilation, and more robust long-term climate reanalysis, with potential for generalization across lidar systems and operational deployment. $
Abstract
Accurate atmospheric profiles from remote sensing instruments such as Doppler Lidar, Radar, and radiometers are frequently corrupted by low-SNR (Signal to Noise Ratio) gates, range folding, and spurious discontinuities. Traditional gap filling blurs fine-scale structures, whereas deep models lack confidence estimates. We present CuMoLoS-MAE, a Curriculum-Guided Monte Carlo Stochastic Ensemble Masked Autoencoder designed to (i) restore fine-scale features such as updraft and downdraft cores, shear lines, and small vortices, (ii) learn a data-driven prior over atmospheric fields, and (iii) quantify pixel-wise uncertainty. During training, CuMoLoS-MAE employs a mask-ratio curriculum that forces a ViT decoder to reconstruct from progressively sparser context. At inference, we approximate the posterior predictive by Monte Carlo over random mask realisations, evaluating the MAE multiple times and aggregating the outputs to obtain the posterior predictive mean reconstruction together with a finely resolved per-pixel uncertainty map. Together with high-fidelity reconstruction, this novel deep learning-based workflow enables enhanced convection diagnostics, supports real-time data assimilation, and improves long-term climate reanalysis.
