Table of Contents
Fetching ...

Elucidating the Grey Atmosphere: SHAP Value Analysis of a Random Forest Atmospheric Neutral Density Model

C. Bard, K. Murphy, A. Halford

TL;DR

This work addresses the interpretability gap in ML-based thermospheric density forecasting by applying TreeSHAP to the RANDM random forest model. The analysis demonstrates that solar irradiance, particularly the 43 nm FISM2 band, largely drives density changes, while geomagnetic activity (SYM-H) increasingly influences predictions during storms, with a practical threshold at $SYM\text{-}H< -60$ nT defining storm-time. The study also reveals day-side and dusk density enhancements, a dawn-dusk asymmetry, and informative local/global interaction patterns that connect model behavior to known space weather physics. Overall, the approach provides interpretable insights, highlights feature redundancies, and supports targeted model refinements for improved predictability of thermospheric density forecasts.

Abstract

We apply SHAP (SHapley Additive exPlanations) analysis using the TreeSHAP algorithm to a Random Forest model (RANDM) designed to predict thermospheric neutral density based on solar-terrestrial data. The analysis shows that RANDM identifies solar irradiance as a significant predictor of thermospheric density. Additionally, the model differentiates between magnetic local times, finding that dusk sectors have higher densities than dawn sectors, in line with prior research. When comparing storm and quiet-time conditions, we find these trends persist regardless of geomagnetic activity levels. The analysis further demonstrates that larger geomagnetic disturbances during storms, as parameterized by the SYM-H index, are associated with higher neutral densities. Notably, SYM-H begins to have the overall largest contribution to density prediction among model inputs at a threshold of -60 nT. This suggests a quantitative definition where ``storm-time'' begins at SYM-H $< -60$ nT. Overall, using TreeSHAP enhances our understanding of the factors influencing thermospheric density and demonstrates the value of explainable machine learning techniques in space weather research, enabling more interpretable models.

Elucidating the Grey Atmosphere: SHAP Value Analysis of a Random Forest Atmospheric Neutral Density Model

TL;DR

This work addresses the interpretability gap in ML-based thermospheric density forecasting by applying TreeSHAP to the RANDM random forest model. The analysis demonstrates that solar irradiance, particularly the 43 nm FISM2 band, largely drives density changes, while geomagnetic activity (SYM-H) increasingly influences predictions during storms, with a practical threshold at nT defining storm-time. The study also reveals day-side and dusk density enhancements, a dawn-dusk asymmetry, and informative local/global interaction patterns that connect model behavior to known space weather physics. Overall, the approach provides interpretable insights, highlights feature redundancies, and supports targeted model refinements for improved predictability of thermospheric density forecasts.

Abstract

We apply SHAP (SHapley Additive exPlanations) analysis using the TreeSHAP algorithm to a Random Forest model (RANDM) designed to predict thermospheric neutral density based on solar-terrestrial data. The analysis shows that RANDM identifies solar irradiance as a significant predictor of thermospheric density. Additionally, the model differentiates between magnetic local times, finding that dusk sectors have higher densities than dawn sectors, in line with prior research. When comparing storm and quiet-time conditions, we find these trends persist regardless of geomagnetic activity levels. The analysis further demonstrates that larger geomagnetic disturbances during storms, as parameterized by the SYM-H index, are associated with higher neutral densities. Notably, SYM-H begins to have the overall largest contribution to density prediction among model inputs at a threshold of -60 nT. This suggests a quantitative definition where ``storm-time'' begins at SYM-H nT. Overall, using TreeSHAP enhances our understanding of the factors influencing thermospheric density and demonstrates the value of explainable machine learning techniques in space weather research, enabling more interpretable models.

Paper Structure

This paper contains 9 sections, 1 equation, 11 figures, 1 table.

Figures (11)

  • Figure 1: Feature importance across 4,000 randomly selected events (2,000 storm-time and 2,000 quiet-time events), as measured by average absolute SHAP value for each feature. The dendrogram on the right displays hierarchical clustering based on feature redundancy analysis, highlighting the potential redundancy among FISM2 spectral bands within the Random Forest model.
  • Figure 2: SHAP beeswarm plot showing feature importance distribution across 4,000 randomly selected events (2,000 storm-time and 2,000 quiet-time events). Individual dots represent specific events, with clustering indicating the frequency of similar SHAP values. The horizontal position of each dot shows the impact of that feature on the model output, while color indicates the feature's value (red = high, blue = low). Columns are sorted in order of absolute mean SHAP value, with higher values (more important) at the top.
  • Figure 3: Calculated Feature importance for small, moderate, and large geomagnetic storms. The relative importance of SYM-H, as assessed by SHAP, increases as the storm gets stronger.
  • Figure 4: Average SHAP values for SYM-H (blue), 43nm FISM2 band (red), and cos(MLT) (purple) are plotted across 5 nT bins of SYM-H values from -20 nT to -75 nT. The crossover point where magnetospheric drivers (SYM-H) become more influential than solar drivers (43nm) occurs at around -60 nT (right dot-dashed line). A proposed "storm warning" threshold (left dashed line) is identified at approximately -35 nT, where the SYM-H SHAP value reaches greater than half the magnitude of the 43nm SHAP value, indicating significantly enhanced magnetospheric influence compared to quiet-time conditions.
  • Figure 5: Interaction matrix obtained via summing the absolute values of the cross-interaction SHAP for each of the 4000 events in the combined storm+quiet data sample. Lighter colors indicate a high cross-interaction; dark colors indicate little cross-interaction (or that the cross-interaction is not relatively important).
  • ...and 6 more figures