Table of Contents
Fetching ...

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

Meredith G. L. Brown, Matt Peterson, Irina Tezaur, Kara Peterson, Diana Bull

TL;DR

This work tackles the challenge of uncovering cascading climate source–impact pathways from time-series data. It introduces an outer-loop workflow that pairs Random Forest Regression with SHAP feature importances to build a weighted, directed pathway graph, enabling tracing of spatio-temporal dependencies and their lags. The method is validated on synthetic coupled equations and Mount Pinatubo eruption simulations using E3SMv2-SPA, demonstrating accurate detection of known pathways and plausible edge structures, while highlighting limitations such as back edges and averaging effects. The study points toward future extensions for regional analyses, sliding-window dynamics, and causal-edge discrimination to enhance interpretability and applicability in real-world climate attribution tasks.

Abstract

Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances. Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities. Importantly, while herein we employ RFR and SHAP feature importance in steps (i) and (ii) of our algorithm, our novel workflow is in no way tied to these approaches, which could be replaced with any regression method and sensitivity method. We adopt a tiered verification approach to verify our new pathway identification methodology. In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2). We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases.

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

TL;DR

This work tackles the challenge of uncovering cascading climate source–impact pathways from time-series data. It introduces an outer-loop workflow that pairs Random Forest Regression with SHAP feature importances to build a weighted, directed pathway graph, enabling tracing of spatio-temporal dependencies and their lags. The method is validated on synthetic coupled equations and Mount Pinatubo eruption simulations using E3SMv2-SPA, demonstrating accurate detection of known pathways and plausible edge structures, while highlighting limitations such as back edges and averaging effects. The study points toward future extensions for regional analyses, sliding-window dynamics, and causal-edge discrimination to enhance interpretability and applicability in real-world climate attribution tasks.

Abstract

Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances. Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities. Importantly, while herein we employ RFR and SHAP feature importance in steps (i) and (ii) of our algorithm, our novel workflow is in no way tied to these approaches, which could be replaced with any regression method and sensitivity method. We adopt a tiered verification approach to verify our new pathway identification methodology. In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2). We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases.
Paper Structure (17 sections, 4 equations, 11 figures, 12 tables, 1 algorithm)

This paper contains 17 sections, 4 equations, 11 figures, 12 tables, 1 algorithm.

Figures (11)

  • Figure 1: Visual depiction of the RFR-based pathway construction approach described in Section \ref{['sec:method']} assuming $~F = \tilde{~F}$.
  • Figure 2: Schematic of RFR and Feature importance-based source-impact pathway construction method described in Algorithm \ref{['alg:pseudocode']}.
  • Figure 3: The stratospheric sulfate burden at different days up to roughly one month following the Mount Pinatubo eruption. The location of Mount Pinatubo is marked with a red triangle. The sulfates have encircled the Earth by approximately 21 days post eruption (e), and have made their way into the subtropics by approximately 31 days post eruption (f).
  • Figure 4: Coupled synthetic equations: ensemble mean goodness of fit statistics. Subfigures a)-d) show the ensemble mean and standard deviation of $W$, $X$, $Y$, and $Z$ respectively. Subfigures e)-h) show the scatterplot comparison between the SOE generated time series and their RFR reconstructions, for $W$, $X$, $Y$,and $Z$ respectively.
  • Figure 5: Coupled synthetic equations: pathway graph. Features are shown in circles, feature connections are shown by the arrows pointing from source feature to target feature. Numbers next to the arrows are the time lags associated with the connection. The edge colors represent SHAP weights, blue indicating higher values and yellow indicating lower values.
  • ...and 6 more figures