Table of Contents
Fetching ...

Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models

Linnea M. Wolniewicz, Halil S. Kelebek, Simone Mestici, Michael D. Vergalla, Giacomo Acciarini, Bala Poduval, Olga Verkhoglyadova, Madhulika Guhathakurta, Thomas E. Berger, Atılım Güneş Baydin, Frank Soboczenski

TL;DR

This work addresses the lack of ML-ready, multi-source ionospheric datasets by introducing a curated open-access data product that fuses solar, solar wind, geomagnetic, and TEC observations. The data product aligns diverse sources in time and space into a modular structure with multiple cadences, including an event catalog to flag geomagnetic storms via NOAA G-levels. Baseline IonCast models (LSTM, SFNO, GraphCast) at 15-minute cadence show 12-hour forecast horizons and improvements over persistence for global TEC maps. By providing the dataset and processing tools, the work enables systematic ML benchmarking and advances Sun-Earth coupling studies and operational space weather forecasting.

Abstract

Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.

Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models

TL;DR

This work addresses the lack of ML-ready, multi-source ionospheric datasets by introducing a curated open-access data product that fuses solar, solar wind, geomagnetic, and TEC observations. The data product aligns diverse sources in time and space into a modular structure with multiple cadences, including an event catalog to flag geomagnetic storms via NOAA G-levels. Baseline IonCast models (LSTM, SFNO, GraphCast) at 15-minute cadence show 12-hour forecast horizons and improvements over persistence for global TEC maps. By providing the dataset and processing tools, the work enables systematic ML benchmarking and advances Sun-Earth coupling studies and operational space weather forecasting.

Abstract

Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.

Paper Structure

This paper contains 4 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Visualization of dataset inputs and alignment in time and dimension. Output dataset incorporates solar and geomagnetic driver data, sparse and dense TEC maps, and orbital mechanics and quasi-dipole data calculated over a latitude-longitude grid.
  • Figure 2: Visualization of the 'Monitoring Event Space-weather TEC Ionospheric Catalog Index' (the MESTICI scale) showing temporal distribution of the Event class for the entire dataset time interval (2010-2024). The x and y axes represent the time (years) and the intensity of the event (G-level), respectively. Each class bin in the y-axis is then divided into four segments, which correspond to the event duration, as shown in the lower part of the plot.