Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model
Denys Godwin, Hanxi Li, Michael Cecil, Hamed Alemohammad
TL;DR
The study tackles cloud-induced gaps in multispectral time-series and juxtaposes a Geospatial Foundation Model (Prithvi ViT) against a CGAN baseline for cloud-gap imputation. It demonstrates that a pretrained ViT, when fine-tuned with real cloud masks, achieves superior $MAE$ and $SSIM$ metrics across realistic masking schemes, even with limited fine-tuning. In both masking regimes, the Prithvi approach outperforms the CGAN, achieving low $MAE$ (around $0.03$ in zero-shot for extensive masking) and robust spatial-temporal consistency. The findings highlight the practicality of GFM-based cloud-gap imputation for augmenting complete time-series data and supporting downstream tasks like land-use monitoring and crop yield estimation, with future work exploring additional data modalities such as DEMs and land-cover layers.
Abstract
Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of multispectral satellite imagery. We randomly mask time series of satellite images using real-world cloud masks and train each model to reconstruct the missing pixels. The ViT model is fine-tuned from a pretrained model, while the CGAN is trained from scratch. Using quantitative evaluation metrics such as structural similarity index and mean absolute error as well as qualitative visual analysis, we assess imputation accuracy and contextual preservation.
