An Interpretable Machine Learning Approach to Understanding the Relationships between Solar Flares and Source Active Regions
Huseyin Cavus, Jason T. L. Wang, Teja P. S. Singampalli, Gani Caglar Coban, Hongyang Zhang, Abd-ur Raheem, Haimin Wang
TL;DR
This study addresses the problem of predicting solar flares by linking flare occurrence to observable AR properties. It uses an interpretable Random Forest classifier augmented with SHAP explanations to classify ARs as producing or not producing $\geq C$-class flares, based on 10 AR features from SolarMonitor.org and XRT flare data (2011–2021). The key contributions are the identification of AR_Type_Today as the most influential predictor, Hale_Class_Yesterday as the least, and the notable role of NoS_Difference in decision-making, demonstrated through global and local interpretability analyses. The work provides a transparent framework for space-weather forecasting that leverages accessible AR features and explicit feature attributions, potentially aiding operational flare prediction systems.
Abstract
Solar flares are defined as outbursts on the surface of the Sun. They occur when energy accumulated in magnetic fields enclosing solar active regions (ARs) is abruptly expelled. Solar flares and associated coronal mass ejections are sources of space weather that adversely impact devices at or near Earth, including the obstruction of high-frequency radio waves utilized for communication and the deterioration of power grid operations. Tracking and delivering early and precise predictions of solar flares is essential for readiness and catastrophe risk mitigation. This paper employs the random forest (RF) model to address the binary classification task, analyzing the links between solar flares and their originating ARs with observational data gathered from 2011 to 2021 by SolarMonitor.org and the XRT flare database. We seek to identify the physical features of a source AR that significantly influence its potential to trigger >=C-class flares. We found that the features of AR_Type_Today, Hale_Class_Yesterday are the most and the least prepotent features, respectively. NoS_Difference has a remarkable effect in decision-making in both global and local interpretations.
