Table of Contents
Fetching ...

Investigating the Efficacy of Topologically Derived Time Series for Flare Forecasting. II. XGBoost Model

Thomas Williams, Christopher B. Prior, David MacTaggart, D. Shaun Bloomfield

TL;DR

This work examines solar flare forecasting using time-dependent topological magnetic parameters derived from the ARTop framework, focusing on current-carrying versus potential topology as predictive signals. An XGBoost classifier is trained on a rich feature set that includes delta-topology inputs, accumulated winding/helicity, velocity-weighted terms, lagged descriptors, kurtosis, and flare history, with a 24-hour forecast horizon. On a validation set, the model achieves a True Skill Statistic of $0.804$ and high accuracy, while a fully independent holdout set yields a more modest $TSS = 0.524$, highlighting challenges from limb projection effects and frequent C-class flares. SHAP analysis confirms the physical interpretability of the model by identifying flare history and accumulated current-carrying winding/helicity as key predictors, and the study discusses practical steps to improve deployability, such as extrapolations for regions entering the disk and multi-model approaches.

Abstract

Solar flares are a primary driver of space weather, and forecasting their occurrence remains a significant challenge. This paper presents a novel flare prediction model based on topologically derived photospheric magnetic parameters. We employ the \texttt{ARTop} framework to compute the time-dependent input rates of magnetic winding and helicity across more than $10^5$ active region (AR) observations, decomposing them into current-carrying and potential components to reduce sensitivity to optical flow methods. An \texttt{XGBoost} machine learning model is trained on these topological time series, alongside engineered features including rolling statistics, kurtosis, and flare history, to predict the probability of $\geq$M1.0-class flares within the next 24 hours. The model demonstrates strong performance on a validation set, with a True Skill Statistic (TSS) of 0.804 for once daily operational region forecasts. When applied to a fully independent holdout set, the operational forecast achieves a TSS of \tsssa. A SHapley Additive exPlanations (SHAP) analysis confirms the model's physical interpretability, identifying flare history and accumulated current-carrying winding and helicity as the most important features. The main challenges identified are false positives arising from ARs with frequent C-class flaring and systematic errors introduced by projection effects when ARs are near the limb. Excluding limb-affected data yields no improvement in the holdout set TSS (\TSSalert\ versus \tsssa), due to the overall decreased number of flares. However, our per-region analysis indicates that mitigating these projection effects is crucial for future operational deployment. This work establishes magnetic topology, particularly its current-carrying components, as a highly effective and physically meaningful set of predictors for solar flare forecasting.

Investigating the Efficacy of Topologically Derived Time Series for Flare Forecasting. II. XGBoost Model

TL;DR

This work examines solar flare forecasting using time-dependent topological magnetic parameters derived from the ARTop framework, focusing on current-carrying versus potential topology as predictive signals. An XGBoost classifier is trained on a rich feature set that includes delta-topology inputs, accumulated winding/helicity, velocity-weighted terms, lagged descriptors, kurtosis, and flare history, with a 24-hour forecast horizon. On a validation set, the model achieves a True Skill Statistic of and high accuracy, while a fully independent holdout set yields a more modest , highlighting challenges from limb projection effects and frequent C-class flares. SHAP analysis confirms the physical interpretability of the model by identifying flare history and accumulated current-carrying winding/helicity as key predictors, and the study discusses practical steps to improve deployability, such as extrapolations for regions entering the disk and multi-model approaches.

Abstract

Solar flares are a primary driver of space weather, and forecasting their occurrence remains a significant challenge. This paper presents a novel flare prediction model based on topologically derived photospheric magnetic parameters. We employ the \texttt{ARTop} framework to compute the time-dependent input rates of magnetic winding and helicity across more than active region (AR) observations, decomposing them into current-carrying and potential components to reduce sensitivity to optical flow methods. An \texttt{XGBoost} machine learning model is trained on these topological time series, alongside engineered features including rolling statistics, kurtosis, and flare history, to predict the probability of M1.0-class flares within the next 24 hours. The model demonstrates strong performance on a validation set, with a True Skill Statistic (TSS) of 0.804 for once daily operational region forecasts. When applied to a fully independent holdout set, the operational forecast achieves a TSS of \tsssa. A SHapley Additive exPlanations (SHAP) analysis confirms the model's physical interpretability, identifying flare history and accumulated current-carrying winding and helicity as the most important features. The main challenges identified are false positives arising from ARs with frequent C-class flaring and systematic errors introduced by projection effects when ARs are near the limb. Excluding limb-affected data yields no improvement in the holdout set TSS (\TSSalert\ versus \tsssa), due to the overall decreased number of flares. However, our per-region analysis indicates that mitigating these projection effects is crucial for future operational deployment. This work establishes magnetic topology, particularly its current-carrying components, as a highly effective and physically meaningful set of predictors for solar flare forecasting.

Paper Structure

This paper contains 15 sections, 5 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Illustrations of field properties measured by the winding flux ${\cal L}'$ and helicity flux ${\cal H}'$. Panels (a)-(c) depict the entanglement of a field line due to a rotational fluid motion at the photosphere (depicted as a plane). The winding measures the rotation of the green and red points where the field line pierces the plane. Panels (d)-(f) depict the rotational motion evaluated by the winding due to a contorted filed line emerging through the photosphere. The helicity measures these rotations multiplied by the flux in the field lines.
  • Figure 2: Workflow diagram of the XGBoost classification model pipeline.
  • Figure 3: Confusion matrix for the XGBoost operational forecasts for all SHARP data in the validation set. The True Skill Statistic score is shown for illustration.
  • Figure 4: SHapley Additive exPlanations (SHAP) analysis lundberg2017unified summary plots of the XGBoost classification model for the Training (left) and Validation (right) Sets. The top 20 most important features are shown in descending order where the feature value increases from low (blue) to high (red). This indicates how each value of the features positively/negatively impacts the magnitude of the prediction by increasing/decreasing the SHAP Value, $\phi$.
  • Figure 5: The Model Outlook and operational forecasts are provided for the flaring regions in the validation set. Here, the outlook provides a yes/no outcome for flaring in the following 24 hours from each SHARP observation. The ground truth is shown in orange, whilst the model prediction is shown in blue. The percentage of limb-affected pixels for the SHARP region is indicated in dashed green. The corresponding operational forecast shown below the Model Outlook collates these 720 s cadence predictions, and if a positive outcome exists within a 24 hour period (midnight-to-midnight), determines that a flare is likely the next day. Here, true negative, false negative, true positive, false positive are indicated in blue, red, green, and orange, respectively.
  • ...and 6 more figures