Early Prediction of Geomagnetic Storms by Machine Learning Algorithms
Iris Yan
TL;DR
This work addresses the challenge of early, reliable prediction of geomagnetic storms by leveraging big data from multiple ground stations and Random Forest regression. By fusing 780 features from OMNIWeb, Kyoto, and GFZ, applying feature selection and downsampling to mitigate class imbalance, the authors predict the future $K_p$ index three hours ahead, achieving 82.55% accuracy. The results suggest that the 3-hour lead time approaches a practical limit due to information decay and the cadence of $K_p$ measurements, but enable meaningful advance warnings for satellites, grids, and communications. The study demonstrates the utility of RF-based nonlinear modeling for sparse, noisy solar-terrestrial data and highlights directions for ensemble methods and physics-informed improvements.
Abstract
Geomagnetic storms (GS) occur when solar winds disrupt Earth's magnetosphere. GS can cause severe damages to satellites, power grids, and communication infrastructures. Estimate of direct economic impacts of a large scale GS exceeds $40 billion a day in the US. Early prediction is critical in preventing and minimizing the hazards. However, current methods either predict several hours ahead but fail to identify all types of GS, or make predictions within short time, e.g., one hour ahead of the occurrence. This work aims to predict all types of geomagnetic storms reliably and as early as possible using big data and machine learning algorithms. By fusing big data collected from multiple ground stations in the world on different aspects of solar measurements and using Random Forests regression with feature selection and downsampling on minor geomagnetic storm instances (which carry majority of the data), we are able to achieve an accuracy of 82.55% on data collected in 2021 when making early predictions three hours in advance. Given that important predictive features such as historic Kp indices are measured every 3 hours and their importance decay quickly with the amount of time in advance, an early prediction of 3 hours ahead of time is believed to be close to the practical limit.
