Table of Contents
Fetching ...

Solar Active Region Magnetogram Image Dataset for Studies of Space Weather

Laura E. Boucheron, Ty Vincent, Jeremy A. Grajeda, Ellery Wuest

TL;DR

This work provides a comprehensive, reproducible magnetogram dataset for space-weather research by integrating NOAA AR catalogs, SDO/HMI magnetograms, and GOES flare labels into a fixed-size, minimally processed dataset available in preconfigured and reduced forms. The authors detail end-to-end data preparation, including AR identification, automated magnetogram download, flare labeling within configurable prediction windows, and stratified train/validation/test splits, enabling robust ML benchmarking. They validate the dataset with baseline magnetic-complexity features using an SVM and with transfer learning via a VGG16 CNN, obtaining competitive performance and demonstrating the utility of both traditional and deep-learning approaches for flare prediction. Overall, the dataset enables reproducible, scalable experiments in solar flare forecasting, with configurable filtering by latitude/longitude, NaN handling, and downsized variants to support rapid experimentation and benchmarking in space-weather research.

Abstract

In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux, generally the source of eruptive events) as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression. This dataset is a minimally processed, user configurable dataset of consistently sized images of solar active regions that can serve as a benchmark dataset for solar flare prediction research.

Solar Active Region Magnetogram Image Dataset for Studies of Space Weather

TL;DR

This work provides a comprehensive, reproducible magnetogram dataset for space-weather research by integrating NOAA AR catalogs, SDO/HMI magnetograms, and GOES flare labels into a fixed-size, minimally processed dataset available in preconfigured and reduced forms. The authors detail end-to-end data preparation, including AR identification, automated magnetogram download, flare labeling within configurable prediction windows, and stratified train/validation/test splits, enabling robust ML benchmarking. They validate the dataset with baseline magnetic-complexity features using an SVM and with transfer learning via a VGG16 CNN, obtaining competitive performance and demonstrating the utility of both traditional and deep-learning approaches for flare prediction. Overall, the dataset enables reproducible, scalable experiments in solar flare forecasting, with configurable filtering by latitude/longitude, NaN handling, and downsized variants to support rapid experimentation and benchmarking in space-weather research.

Abstract

In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux, generally the source of eruptive events) as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression. This dataset is a minimally processed, user configurable dataset of consistently sized images of solar active regions that can serve as a benchmark dataset for solar flare prediction research.
Paper Structure (15 sections, 1 equation, 6 figures, 2 tables)

This paper contains 15 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Flowchart of dataset creation. Space Weather Prediction Center (SWPC) Solar Region Summaries (SRS) are used to determine the dates for which a National Oceanic and Atmospheric Administration (NOAA) Active Region (AR) is visible on disk. Solar Dynamics Observatory (SDO) Helioseismic and Magnetic Imager (HMI) magnetogram images of ARs are downloaded via the Joint Science Operations Center (JSOC) web interface. SWPC Event Reports (ER) are used to specify the time and size of solar flares associated with a given NOAA AR.
  • Figure 2: Latitude and longitude of AR images. The red circle denotes the solar radius and the green lines denote $\pm60^\circ$ latitude and longitude. The blue dots denote the centroids of the ARs included in the respective datasets. a: Latitude and longitude of files for entire dataset (image set). b: Latitude and longitude of files within $\pm60^\circ$ and $\ge1$ NaN pixels. c: Latitude and longitude of files for preconfigured AR dataset.
  • Figure 3: Examples of $600\times600$ pixel magnetogram images, including a disk-edge magnetogram and an on-disk magnetogram. a: Disk-edge magnetogram. NOAA AR 1169, 2011 March 15, 12:00:00. b: On-disk magnetogram. NOAA AR 2396, 2015 August 11, 00:00:00.
  • Figure 4: Count of events or files for different flaring behavior versus quarter. a: Count of flare events in the entire dataset. b: Flare file count for the entire dataset. c: Flare and non-flare file count for the entire dataset. d: Flare file count for the preconfigured dataset. e: Flare and non-flare file count for the preconfigured dataset.
  • Figure 5: Flowchart of SVM classification of flare activity.
  • ...and 1 more figures