Table of Contents
Fetching ...

Early Prediction of Geomagnetic Storms by Machine Learning Algorithms

Iris Yan

TL;DR

This work addresses the challenge of early, reliable prediction of geomagnetic storms by leveraging big data from multiple ground stations and Random Forest regression. By fusing 780 features from OMNIWeb, Kyoto, and GFZ, applying feature selection and downsampling to mitigate class imbalance, the authors predict the future $K_p$ index three hours ahead, achieving 82.55% accuracy. The results suggest that the 3-hour lead time approaches a practical limit due to information decay and the cadence of $K_p$ measurements, but enable meaningful advance warnings for satellites, grids, and communications. The study demonstrates the utility of RF-based nonlinear modeling for sparse, noisy solar-terrestrial data and highlights directions for ensemble methods and physics-informed improvements.

Abstract

Geomagnetic storms (GS) occur when solar winds disrupt Earth's magnetosphere. GS can cause severe damages to satellites, power grids, and communication infrastructures. Estimate of direct economic impacts of a large scale GS exceeds $40 billion a day in the US. Early prediction is critical in preventing and minimizing the hazards. However, current methods either predict several hours ahead but fail to identify all types of GS, or make predictions within short time, e.g., one hour ahead of the occurrence. This work aims to predict all types of geomagnetic storms reliably and as early as possible using big data and machine learning algorithms. By fusing big data collected from multiple ground stations in the world on different aspects of solar measurements and using Random Forests regression with feature selection and downsampling on minor geomagnetic storm instances (which carry majority of the data), we are able to achieve an accuracy of 82.55% on data collected in 2021 when making early predictions three hours in advance. Given that important predictive features such as historic Kp indices are measured every 3 hours and their importance decay quickly with the amount of time in advance, an early prediction of 3 hours ahead of time is believed to be close to the practical limit.

Early Prediction of Geomagnetic Storms by Machine Learning Algorithms

TL;DR

This work addresses the challenge of early, reliable prediction of geomagnetic storms by leveraging big data from multiple ground stations and Random Forest regression. By fusing 780 features from OMNIWeb, Kyoto, and GFZ, applying feature selection and downsampling to mitigate class imbalance, the authors predict the future index three hours ahead, achieving 82.55% accuracy. The results suggest that the 3-hour lead time approaches a practical limit due to information decay and the cadence of measurements, but enable meaningful advance warnings for satellites, grids, and communications. The study demonstrates the utility of RF-based nonlinear modeling for sparse, noisy solar-terrestrial data and highlights directions for ensemble methods and physics-informed improvements.

Abstract

Geomagnetic storms (GS) occur when solar winds disrupt Earth's magnetosphere. GS can cause severe damages to satellites, power grids, and communication infrastructures. Estimate of direct economic impacts of a large scale GS exceeds $40 billion a day in the US. Early prediction is critical in preventing and minimizing the hazards. However, current methods either predict several hours ahead but fail to identify all types of GS, or make predictions within short time, e.g., one hour ahead of the occurrence. This work aims to predict all types of geomagnetic storms reliably and as early as possible using big data and machine learning algorithms. By fusing big data collected from multiple ground stations in the world on different aspects of solar measurements and using Random Forests regression with feature selection and downsampling on minor geomagnetic storm instances (which carry majority of the data), we are able to achieve an accuracy of 82.55% on data collected in 2021 when making early predictions three hours in advance. Given that important predictive features such as historic Kp indices are measured every 3 hours and their importance decay quickly with the amount of time in advance, an early prediction of 3 hours ahead of time is believed to be close to the practical limit.
Paper Structure (5 sections, 7 figures)

This paper contains 5 sections, 7 figures.

Figures (7)

  • Figure 1: Illustration of solar winds, Earth's magnetosphere and geomagnetic storms. The dark disc indicates the Earth. The dashed line indicates the surface of the Earth's magnetosphere, where bow shock occurs when the solar winds interact with the Earth's magnetosphere.
  • Figure 2: Wide range of solar field magnitudes for a given future (e.g., 3 hours later) Kp-index.
  • Figure 3: Overall architecture of the proposed approach.
  • Figure 4: Principal component visualization of data. The points are drawn by their value along the top 2 principal directions, and labeled by Kp indices (rounded for clarity) and marked by different colors and shapes.
  • Figure 5: Variable importance. Kp indices, FMAs, and other solar activity measurements taken at different time are arranged by the time they were measured, with more recent ones displayed first. The numbers in the label of points indicate the number of minutes till the current moment at which time we are to predicate the geomagnetic disturbance 3 hours later. For clarity, features other than FMAs or dst's are not labeled.
  • ...and 2 more figures