Table of Contents
Fetching ...

A Foundation Model for the Solar Dynamics Observatory

James Walsh, Daniel G. Gass, Raul Ramos Pollan, Paul J. Wright, Richard Galvez, Noah Kasmanoff, Jason Naradowsky, Anne Spalding, James Parr, Atılım Güneş Baydin

TL;DR

This paper discusses four key components: an ingestion pipeline to create machine learning ready datasets, the model architecture and training approach, resultant embeddings and fine-tunable models, and finally downstream fine-tuned applications.

Abstract

SDO-FM is a foundation model using data from NASA's Solar Dynamics Observatory (SDO) spacecraft; integrating three separate instruments to encapsulate the Sun's complex physical interactions into a multi-modal embedding space. This model can be used to streamline scientific investigations involving SDO by making the enormous datasets more computationally accessible for heliophysics research and enable investigations that require instrument fusion. We discuss four key components: an ingestion pipeline to create machine learning ready datasets, the model architecture and training approach, resultant embeddings and fine-tunable models, and finally downstream fine-tuned applications. A key component of this effort has been to include subject matter specialists at each stage of development; reviewing the scientific value and providing guidance for model architecture, dataset, and training paradigm decisions. This paper marks release of our pretrained models and embedding datasets, available to the community on Hugging Face and sdofm.org.

A Foundation Model for the Solar Dynamics Observatory

TL;DR

This paper discusses four key components: an ingestion pipeline to create machine learning ready datasets, the model architecture and training approach, resultant embeddings and fine-tunable models, and finally downstream fine-tuned applications.

Abstract

SDO-FM is a foundation model using data from NASA's Solar Dynamics Observatory (SDO) spacecraft; integrating three separate instruments to encapsulate the Sun's complex physical interactions into a multi-modal embedding space. This model can be used to streamline scientific investigations involving SDO by making the enormous datasets more computationally accessible for heliophysics research and enable investigations that require instrument fusion. We discuss four key components: an ingestion pipeline to create machine learning ready datasets, the model architecture and training approach, resultant embeddings and fine-tunable models, and finally downstream fine-tuned applications. A key component of this effort has been to include subject matter specialists at each stage of development; reviewing the scientific value and providing guidance for model architecture, dataset, and training paradigm decisions. This paper marks release of our pretrained models and embedding datasets, available to the community on Hugging Face and sdofm.org.
Paper Structure (18 sections, 9 figures, 1 table)

This paper contains 18 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Two methods of using the pre-trained backbone, directly with an adaptor for fine-tuning arch_diags, or with a new model consuming the generated latent representation directly.
  • Figure 2: Visualizations of the samae (left) and Nouveau-VAE (right), where the samae input has been transformed for full coverage, and the original Nouveau-VAE code base expanded to enable extraction of the latent representation.
  • Figure 3: Missing/corrupt data reconstruction process.
  • Figure 4: Instrument degradation prediction, head architecture reproduced with permission, 2021AA648A53D.
  • Figure 5: samae reconstruction with disk transform, notice how in the circled areas peaks of some wavelengths are not captured well.
  • ...and 4 more figures