Table of Contents
Fetching ...

An Interpretable Machine Learning Approach to Understanding the Relationships between Solar Flares and Source Active Regions

Huseyin Cavus, Jason T. L. Wang, Teja P. S. Singampalli, Gani Caglar Coban, Hongyang Zhang, Abd-ur Raheem, Haimin Wang

TL;DR

This study addresses the problem of predicting solar flares by linking flare occurrence to observable AR properties. It uses an interpretable Random Forest classifier augmented with SHAP explanations to classify ARs as producing or not producing $\geq C$-class flares, based on 10 AR features from SolarMonitor.org and XRT flare data (2011–2021). The key contributions are the identification of AR_Type_Today as the most influential predictor, Hale_Class_Yesterday as the least, and the notable role of NoS_Difference in decision-making, demonstrated through global and local interpretability analyses. The work provides a transparent framework for space-weather forecasting that leverages accessible AR features and explicit feature attributions, potentially aiding operational flare prediction systems.

Abstract

Solar flares are defined as outbursts on the surface of the Sun. They occur when energy accumulated in magnetic fields enclosing solar active regions (ARs) is abruptly expelled. Solar flares and associated coronal mass ejections are sources of space weather that adversely impact devices at or near Earth, including the obstruction of high-frequency radio waves utilized for communication and the deterioration of power grid operations. Tracking and delivering early and precise predictions of solar flares is essential for readiness and catastrophe risk mitigation. This paper employs the random forest (RF) model to address the binary classification task, analyzing the links between solar flares and their originating ARs with observational data gathered from 2011 to 2021 by SolarMonitor.org and the XRT flare database. We seek to identify the physical features of a source AR that significantly influence its potential to trigger >=C-class flares. We found that the features of AR_Type_Today, Hale_Class_Yesterday are the most and the least prepotent features, respectively. NoS_Difference has a remarkable effect in decision-making in both global and local interpretations.

An Interpretable Machine Learning Approach to Understanding the Relationships between Solar Flares and Source Active Regions

TL;DR

This study addresses the problem of predicting solar flares by linking flare occurrence to observable AR properties. It uses an interpretable Random Forest classifier augmented with SHAP explanations to classify ARs as producing or not producing -class flares, based on 10 AR features from SolarMonitor.org and XRT flare data (2011–2021). The key contributions are the identification of AR_Type_Today as the most influential predictor, Hale_Class_Yesterday as the least, and the notable role of NoS_Difference in decision-making, demonstrated through global and local interpretability analyses. The work provides a transparent framework for space-weather forecasting that leverages accessible AR features and explicit feature attributions, potentially aiding operational flare prediction systems.

Abstract

Solar flares are defined as outbursts on the surface of the Sun. They occur when energy accumulated in magnetic fields enclosing solar active regions (ARs) is abruptly expelled. Solar flares and associated coronal mass ejections are sources of space weather that adversely impact devices at or near Earth, including the obstruction of high-frequency radio waves utilized for communication and the deterioration of power grid operations. Tracking and delivering early and precise predictions of solar flares is essential for readiness and catastrophe risk mitigation. This paper employs the random forest (RF) model to address the binary classification task, analyzing the links between solar flares and their originating ARs with observational data gathered from 2011 to 2021 by SolarMonitor.org and the XRT flare database. We seek to identify the physical features of a source AR that significantly influence its potential to trigger >=C-class flares. We found that the features of AR_Type_Today, Hale_Class_Yesterday are the most and the least prepotent features, respectively. NoS_Difference has a remarkable effect in decision-making in both global and local interpretations.

Paper Structure

This paper contains 11 sections, 5 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Confusion matrix obtained by our RF model on the test set with 168 test samples.
  • Figure 2: Beeswarm plot to visualize the positive or negative effect of a feature for each test sample, represented by a color dot, on the RF model’s predictions.
  • Figure 3: Bar plot to display the global importance of each feature on our RF model’s predictions.
  • Figure 4: Decision plot to understand how our RF model produces its predictions.
  • Figure 5: Waterfall plot for a test sample predicted to be positive. The plot shows the relative contribution of each feature to the model’s prediction $f(x)$ = 0.9, starting from the base value $E[f(x)]$ = 0.726. The $x$-axis represents the model output value (predicted probability) while the $y$-axis shows the features and their value. We encode the categorical features AR_Type and Hale_Class where AR_Type = 2 represents $\gamma$ and Hale_Class = 5 represents $\beta$$\gamma$. The arrows display the SHAP value associated with each feature, colored red if positive and blue if negative.
  • ...and 4 more figures