Table of Contents
Fetching ...

GAIA: A Foundation Model for Operational Atmospheric Dynamics

Ata Akbari Asanjan, Olivia Alexander, Tom Berg, Stephen Peng, Jad Makki, Clara Zhang, Matt Yang, Disha Shidham, Srija Chakraborty, William Bender, Cara Crawford, Arun Ravindran, Olivier Raiman, David Potere, David Bell

TL;DR

GAIA tackles the challenge of learning transferable representations from noisy, high-dimensional geostationary satellite imagery by combining MAE and DINO into a single framework trained on 15 years of GOES infrared data. The hybrid objective yields semantically disentangled, temporally coherent embeddings that improve gap filling and downstream tasks such as precipitation estimation, atmospheric river segmentation, and tropical cyclone detection. The paper demonstrates that joint self-supervised objectives outperform single-objective baselines, supported by PCA and temporal analyses that reveal distributed variance and smooth atmospheric evolution. The work offers a practical, open-source foundation for operational meteorology with robust performance across diverse tasks.

Abstract

We introduce GAIA (Geospatial Artificial Intelligence for Atmospheres), a hybrid self-supervised geospatial foundation model that fuses Masked Autoencoders (MAE) with self-distillation with no labels (DINO) to generate semantically rich representations from global geostationary satellite imagery. Pre-trained on 15 years of globally-merged infrared observations (2001-2015), GAIA learns disentangled representations that capture atmospheric dynamics rather than trivial diurnal patterns, as evidenced by distributed principal component structure and temporal coherence analysis. We demonstrate robust reconstruction capabilities across varying data availability (30-95% masking), achieving superior gap-filling performance on real missing data patterns. When transferred to downstream tasks, GAIA consistently outperforms an MAE-only baseline: improving atmospheric river segmentation (F1: 0.58 vs 0.52), enhancing tropical cyclone detection (storm-level recall: 81% vs 75%, early detection: 29% vs 17%), and maintaining competitive precipitation estimation performance. Analysis reveals that GAIA's hybrid objectives encourage learning of spatially coherent, object-centric features distributed across multiple principal components rather than concentrated representations focused on reconstruction. This work demonstrates that combining complementary self-supervised objectives yields more transferable representations for diverse atmospheric modeling tasks. Model weights and code are available at: https://huggingface.co/bcg-usra-nasa-gaia/GAIA-v1.

GAIA: A Foundation Model for Operational Atmospheric Dynamics

TL;DR

GAIA tackles the challenge of learning transferable representations from noisy, high-dimensional geostationary satellite imagery by combining MAE and DINO into a single framework trained on 15 years of GOES infrared data. The hybrid objective yields semantically disentangled, temporally coherent embeddings that improve gap filling and downstream tasks such as precipitation estimation, atmospheric river segmentation, and tropical cyclone detection. The paper demonstrates that joint self-supervised objectives outperform single-objective baselines, supported by PCA and temporal analyses that reveal distributed variance and smooth atmospheric evolution. The work offers a practical, open-source foundation for operational meteorology with robust performance across diverse tasks.

Abstract

We introduce GAIA (Geospatial Artificial Intelligence for Atmospheres), a hybrid self-supervised geospatial foundation model that fuses Masked Autoencoders (MAE) with self-distillation with no labels (DINO) to generate semantically rich representations from global geostationary satellite imagery. Pre-trained on 15 years of globally-merged infrared observations (2001-2015), GAIA learns disentangled representations that capture atmospheric dynamics rather than trivial diurnal patterns, as evidenced by distributed principal component structure and temporal coherence analysis. We demonstrate robust reconstruction capabilities across varying data availability (30-95% masking), achieving superior gap-filling performance on real missing data patterns. When transferred to downstream tasks, GAIA consistently outperforms an MAE-only baseline: improving atmospheric river segmentation (F1: 0.58 vs 0.52), enhancing tropical cyclone detection (storm-level recall: 81% vs 75%, early detection: 29% vs 17%), and maintaining competitive precipitation estimation performance. Analysis reveals that GAIA's hybrid objectives encourage learning of spatially coherent, object-centric features distributed across multiple principal components rather than concentrated representations focused on reconstruction. This work demonstrates that combining complementary self-supervised objectives yields more transferable representations for diverse atmospheric modeling tasks. Model weights and code are available at: https://huggingface.co/bcg-usra-nasa-gaia/GAIA-v1.

Paper Structure

This paper contains 33 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Data preprocessing pipelines for GAIA and downstreams, including data ingestion, cleaning, normalization, and feature extraction.
  • Figure 2: Architecture overview of the GAIA framework. The model combines two powerful self-supervised learning approaches: (a) MAE, which randomly masks patches of the input image and learns to reconstruct the missing regions, and (b) DINO, which uses a teacher-student architecture for knowledge distillation. The encoder processes visible patches (25% of image for the student / MAE and 75% of image for the teacher) through a series of transformer blocks, while the lightweight decoder reconstructs the full image from visible tokens. This architecture enables efficient self-supervised pretraining on large-scale datasets without requiring manual labels, leading to robust visual representations that can be fine-tuned for downstream tasks.
  • Figure 3: PCA projections of learned patch embeddings across four sample timesteps. PCA performed on 912-dimensional embedding vectors produced by the encoder. The RGB channels correspond to principal components 1, 2, and 3, respectively. From left to right: GAIA (MAE-DINO), DINO-only, MAE-only, and target IR input data. Patch-level embeddings are upsampled to pixel resolution using AnyUp wimmer2025anyup for visualization. GAIA (first column) shows distinct pattern separation with spatially coherent features aligned to meaningful atmospheric zones (e.g., equatorial cloud belts, longitudinal structures). In contrast, DINO (second column) and MAE (third column) produce embeddings that closely resemble smoothed versions of the input (fourth column) with positional structure, indicating concentrated, reconstruction-focused representations rather than disentangled semantic features.
  • Figure 4: t-SNE projection of GAIA embeddings over 3 days (January 1-3, 2022). Each point represents a single timestep, with color indicating chronological progression. Embeddings form a smooth temporal trajectory where nearby points correspond to nearby times, demonstrating that the model learns temporally coherent representations. The continuous structure without discrete clusters indicates that embeddings reflect atmospheric evolution rather than categorical time-of-day encodings.
  • Figure 5: Quantitative evaluation of gap filling quality across different mask ratios. RMSE over masked patches is lower across the board for GAIA.
  • ...and 6 more figures