Table of Contents
Fetching ...

Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

Saranya Ganesh S., Frederick Iat-Hin Tam, Milton S. Gomez, Marie McGraw, Mark DeMaria, Kate Musgrave, Jakob Runge, Tom Beucler

TL;DR

This study tackles the difficulty of predicting Atlantic hurricane intensity by applying multidata causal discovery to identify predictors with direct causal influence on intensity changes. The authors replicate SHIPS predictors using ERA5/TC PRIMED data and test causal feature selection against correlation and random-forest baselines, showing superior generalization for short lead times. They extend SHIPS with six causally chosen predictors (SHIPS+) and demonstrate that nonlinear modeling with MLPs further improves skill, especially beyond 72 hours. The Hurricane Larry case study and operational-like SHIPS tests confirm that SHIPS+ with nonlinear modeling yields tangible forecast improvements and greater interpretability by focusing on physically meaningful drivers. The work highlights a path toward more empirical, causally grounded hurricane intensity forecasts that generalize better to unseen storms.

Abstract

Improving statistical forecasts of Atlantic hurricane intensity is limited by complex nonlinear interactions and difficulty in identifying relevant predictors. Conventional methods prioritize correlation or fit, often overlooking confounding variables and limiting generalizability to unseen tropical storms. To address this, we leverage a multidata causal discovery framework with a replicated dataset based on Statistical Hurricane Intensity Prediction Scheme (SHIPS) using ERA5 meteorological reanalysis. We conduct multiple experiments to identify and select predictors causally linked to hurricane intensity changes. We train multiple linear regression models to compare causal feature selection with no selection, correlation, and random forest feature importance across five forecast lead times from 1 to 5 days (24 to 120 hours). Causal feature selection consistently outperforms on unseen test cases, especially for lead times shorter than 3 days. The causal features primarily include vertical shear, mid-tropospheric potential vorticity and surface moisture conditions, which are physically significant yet often underutilized in hurricane intensity predictions. Further, we build an extended predictor set (SHIPS+) by adding selected features to the standard SHIPS predictors. SHIPS+ yields increased short-term predictive skill at lead times of 24, 48, and 72 hours. Adding nonlinearity using multilayer perceptron further extends skill to longer lead times, despite our framework being purely regional and not requiring global forecast data. Operational SHIPS tests confirm that three of the six added causally discovered predictors improve forecasts, with the largest gains at longer lead times. Our results demonstrate that causal discovery improves hurricane intensity prediction and pave the way toward more empirical forecasts.

Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

TL;DR

This study tackles the difficulty of predicting Atlantic hurricane intensity by applying multidata causal discovery to identify predictors with direct causal influence on intensity changes. The authors replicate SHIPS predictors using ERA5/TC PRIMED data and test causal feature selection against correlation and random-forest baselines, showing superior generalization for short lead times. They extend SHIPS with six causally chosen predictors (SHIPS+) and demonstrate that nonlinear modeling with MLPs further improves skill, especially beyond 72 hours. The Hurricane Larry case study and operational-like SHIPS tests confirm that SHIPS+ with nonlinear modeling yields tangible forecast improvements and greater interpretability by focusing on physically meaningful drivers. The work highlights a path toward more empirical, causally grounded hurricane intensity forecasts that generalize better to unseen storms.

Abstract

Improving statistical forecasts of Atlantic hurricane intensity is limited by complex nonlinear interactions and difficulty in identifying relevant predictors. Conventional methods prioritize correlation or fit, often overlooking confounding variables and limiting generalizability to unseen tropical storms. To address this, we leverage a multidata causal discovery framework with a replicated dataset based on Statistical Hurricane Intensity Prediction Scheme (SHIPS) using ERA5 meteorological reanalysis. We conduct multiple experiments to identify and select predictors causally linked to hurricane intensity changes. We train multiple linear regression models to compare causal feature selection with no selection, correlation, and random forest feature importance across five forecast lead times from 1 to 5 days (24 to 120 hours). Causal feature selection consistently outperforms on unseen test cases, especially for lead times shorter than 3 days. The causal features primarily include vertical shear, mid-tropospheric potential vorticity and surface moisture conditions, which are physically significant yet often underutilized in hurricane intensity predictions. Further, we build an extended predictor set (SHIPS+) by adding selected features to the standard SHIPS predictors. SHIPS+ yields increased short-term predictive skill at lead times of 24, 48, and 72 hours. Adding nonlinearity using multilayer perceptron further extends skill to longer lead times, despite our framework being purely regional and not requiring global forecast data. Operational SHIPS tests confirm that three of the six added causally discovered predictors improve forecasts, with the largest gains at longer lead times. Our results demonstrate that causal discovery improves hurricane intensity prediction and pave the way toward more empirical forecasts.

Paper Structure

This paper contains 31 sections, 3 equations, 22 figures, 6 tables.

Figures (22)

  • Figure 1: Multidata causal feature selection methodology. Step 1: Preprocessed spatiotemporal fields for all TC cases form an ensemble of aligned time series, which may contain spurious or non-causal relationships due to autocorrelation or confounding. Step 2: These multivariate time series (training set) are input to the multidata causal discovery algorithm (M-PC), which selects candidate predictors while controlling set size via hyperparameters. Each candidate set is evaluated using cross-validated regression. Step 3: Predictors appearing in at four out of the seven folds are pooled to form the final feature set. The goal is to estimate the portion of the true causal graph that helps predict TC intensity changes.
  • Figure 2: Example results for the 24-hour intensity change forecast (DELV24) from Fold 3 using the SHIPS+ERA5 predictor set for SHIPS predictors. (a) Coefficient of determination $R^2$ on training, validation, and test sets plotted against the number of selected predictors, each point corresponding to a different value of the M-PC causal discovery hyperparameter pc_alpha (bottom scale). The vertical dashed line indicates the configuration with the highest validation $R^2$. (b) Variable selection abacus: each dot shows the presence of a predictor across the pc_alpha range. Variables are colored by group (e.g., Original SHIPS predictors, Shear, Humidity), vertical dashed line marks the best validation score, and encircled dots highlight the occurrence of new shortlisted predictors for SHIPS.
  • Figure 3: Summary of results for the 24-hour intensity change forecast (DELV24) using SHIPS+ERA5 predictors without SHIPS link assumptions. (a) Bar plot showing the frequency of each variable’s selection across the best models from all seven cross-validation folds. A red dashed line marks the threshold (more than 3 folds) used to shortlist robust predictors for inclusion in the final SHIPS+ list. (b) Boxplot comparing test R² values for target DELV for each lead times 24, 48, 72, 96, 120 hrs for experiments with kitchen-sink approach (without link assumptions) across four feature selection strategies: causal discovery, correlation ranking, random forest importance, and no selection. Causal feature selection yields the highest median R² until 72 hrs lead time, showing improved generalization in a purely statistical prediction setup.
  • Figure 4: Comparison of test $R^2$ values across forecast lead times (24–120 hr) for the original SHIPS predictors (blue/green boxes) and the expanded SHIPS+ predictors (orange/yellow boxes), using no feature selection. Both MLR and MLP runs are shown to illustrate the added value of nonlinear modeling. Dashed brown lines indicate the median, and solid black lines mark the mean. Overall, the MLP consistently outperforms the MLR, demonstrating improved skill when nonlinearity is captured, while the inclusion of additional predictors in SHIPS+ further enhances forecast performance, especially at shorter lead times.Note that the SHIPS+ MLR $R^2$ drops below 0 at the 120 hr lead time, which is indicated in the figure by a downward arrow.
  • Figure 5: Predictor importance and dependencies for models trained on SHIPS+. (a–b) Global feature importance ranked by mean $\left|\textrm{SHAP}\right|$ for the MLP (a) and MLR (b). (c) SHAP dependence for baseline SHIPS predictors POT (potential intensity minus current intensity) and T200 (200-hPa temperature, 200–800 km-averaged). (d) SHAP dependence for causally selected predictors SHL1 (1000–850-hPa vertical wind shear, 200–1000 km-averaged) and PVOR (500-hPa potential vorticity, 200–800 km-averaged). (e) For near-surface humidity, the MLP learns opposite dependencies: negative with R001 (1000-hPa relative humidity, 0–500 km-averaged) and positive with R000 (1000-hPa relative humidity, 200–800 km-averaged). All predictors in panels c-e are standardized.
  • ...and 17 more figures