Table of Contents
Fetching ...

Forecasting precipitation in the Arctic using probabilistic machine learning informed by causal climate drivers

Madhurima Panja, Dhiman Das, Tanujit Chakraborty, Arnob Ray, R. Athulya, Chittaranjan Hens, Syamal K. Dana, Nuncio Murukesh, Dibakar Ghosh

TL;DR

The paper tackles Arctic precipitation forecasting under data scarcity and strong multivariate drivers by fusing scale-aware causal analysis with probabilistic machine learning. It combines wavelet coherence to characterize driver–precipitation dependencies with a Synergistic-Unique-Redundant Decomposition (SURD) to quantify multivariate information transfer, and then trains exogenous, boosting-based models (notably XGBoostX) with conformal prediction to yield calibrated forecast intervals. The study shows that synergistic multivariate interactions dominate the predictive information, and that XGBoostX with exogenous climatic drivers provides the most accurate and stable forecasts for Bear Island and Ny-Ålesund, accompanied by reliable uncertainty quantification. This framework improves interpretability and operational usefulness for Arctic early warning and risk management, while highlighting areas for future enhancement, including tail modeling via Extreme Value Theory and incorporating spatial dependencies.

Abstract

Understanding and forecasting precipitation events in the Arctic maritime environments, such as Bear Island and Ny-Ålesund, is crucial for assessing climate risk and developing early warning systems in vulnerable marine regions. This study proposes a probabilistic machine learning framework for modeling and predicting the dynamics and severity of precipitation. We begin by analyzing the scale-dependent relationships between precipitation and key atmospheric drivers (e.g., temperature, relative humidity, cloud cover, and air pressure) using wavelet coherence, which captures localized dependencies across time and frequency domains. To assess joint causal influences, we employ Synergistic-Unique-Redundant Decomposition, which quantifies the impact of interaction effects among each variable on future precipitation dynamics. These insights inform the development of data-driven forecasting models that incorporate both historical precipitation and causal climate drivers. To account for uncertainty, we employ the conformal prediction method, which enables the generation of calibrated non-parametric prediction intervals. Our results underscore the importance of utilizing a comprehensive framework that combines causal analysis with probabilistic forecasting to enhance the reliability and interpretability of precipitation predictions in Arctic marine environments.

Forecasting precipitation in the Arctic using probabilistic machine learning informed by causal climate drivers

TL;DR

The paper tackles Arctic precipitation forecasting under data scarcity and strong multivariate drivers by fusing scale-aware causal analysis with probabilistic machine learning. It combines wavelet coherence to characterize driver–precipitation dependencies with a Synergistic-Unique-Redundant Decomposition (SURD) to quantify multivariate information transfer, and then trains exogenous, boosting-based models (notably XGBoostX) with conformal prediction to yield calibrated forecast intervals. The study shows that synergistic multivariate interactions dominate the predictive information, and that XGBoostX with exogenous climatic drivers provides the most accurate and stable forecasts for Bear Island and Ny-Ålesund, accompanied by reliable uncertainty quantification. This framework improves interpretability and operational usefulness for Arctic early warning and risk management, while highlighting areas for future enhancement, including tail modeling via Extreme Value Theory and incorporating spatial dependencies.

Abstract

Understanding and forecasting precipitation events in the Arctic maritime environments, such as Bear Island and Ny-Ålesund, is crucial for assessing climate risk and developing early warning systems in vulnerable marine regions. This study proposes a probabilistic machine learning framework for modeling and predicting the dynamics and severity of precipitation. We begin by analyzing the scale-dependent relationships between precipitation and key atmospheric drivers (e.g., temperature, relative humidity, cloud cover, and air pressure) using wavelet coherence, which captures localized dependencies across time and frequency domains. To assess joint causal influences, we employ Synergistic-Unique-Redundant Decomposition, which quantifies the impact of interaction effects among each variable on future precipitation dynamics. These insights inform the development of data-driven forecasting models that incorporate both historical precipitation and causal climate drivers. To account for uncertainty, we employ the conformal prediction method, which enables the generation of calibrated non-parametric prediction intervals. Our results underscore the importance of utilizing a comprehensive framework that combines causal analysis with probabilistic forecasting to enhance the reliability and interpretability of precipitation predictions in Arctic marine environments.

Paper Structure

This paper contains 12 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: (a) Geographic locations of Bear Island (Bjørnøya; 74.4522° N, 19.1152° E) and Ny-Ålesund (78.923° N, 11.923° E), shown as red dots on an Arctic basemap generated with the NASA Blue Marble Earth image using Basemap v1.2.1. (b) High-resolution topographic map of Bear Island and (c) Ny-Ålesund, produced with PyGMT v0.16.0 and 15-arcsecond Earth relief data.
  • Figure 2: Temporal evolution of precipitation, mean air temperature, mean relative humidity, mean cloud cover, and average air pressure recorded in (a) Bjørnøya and (b) Ny-Ålesund regions during 1991-2021. Weekly aggregated variables depict the seasonal trends and long-term variation of the series.
  • Figure 3: Wavelet coherence analysis of precipitation and climatic variables at Bjørnøya from 1991 to 2021. Figures (a) to (d) demonstrate the wavelet coherence between precipitation and each of the four climatic variables: temperature, relative humidity, cloud cover, and air pressure, respectively. Warmer hues (red/yellow) indicate strong coherence, while cooler colors (blue) denote weak coherence. Arrows represent phase relationships: rightward arrows indicate in-phase synchronization, leftward arrows show anti-phase interactions, and upward/downward arrows denote lead-lag dynamics. Black contours highlight regions statistically significant at the 5% level, tested against the null hypothesis that both time series are independent red-noise (AR(1)) processes. Significance was determined via Monte Carlo simulations, with contours marking areas where observed coherence exceeds the 95th percentile of surrogate distributions.
  • Figure 4: Wavelet coherence analysis of precipitation and climatic variables at Ny-Ålesund from 1991 to 2021. Figures (a) to (d) demonstrate the wavelet coherence between precipitation and each of the four climatic variables: temperature, relative humidity, cloud cover, and air pressure, respectively. Warmer hues (red/yellow) indicate strong coherence, while cooler colors (blue) denote weak coherence. Arrows represent phase relationships: rightward arrows indicate in-phase synchronization, leftward arrows show anti-phase interactions, and upward/downward arrows denote lead-lag dynamics. Black contours highlight regions statistically significant at the 5% level, tested against the null hypothesis that both time series are independent red-noise (AR(1)) processes. Significance was determined via Monte Carlo simulations, with contours marking areas where observed coherence exceeds the 95th percentile of surrogate distributions. The gray area outside the cone of impact denotes regions affected by edge effects. The y-axis (log$_2$-transformed) represents temporal scales in weeks, showing both short-term (bottom) and long-term (top) patterns, while the x-axis shows the temporal progression of the 31-year study period.
  • Figure 5: SURD analysis of causal information from lagged variables ${y, x_1, x_2, x_3}$ to the future target $\hat{y}$. The decomposition illustrates how the total $\mathrm{I}(\hat{y}; y, x_1, x_2, x_3 )$ and individual $\mathrm{I}(\hat{y}, \cdot)$ mutual information is distributed across synergistic (red), unique (blue), and redundant (yellow) components. The term $\Delta \mathrm{I}_{\text{leak} \rightarrow \hat{y}}$ (gray) captures unexplained or leaked causality. Specifically, $\Delta \mathrm{I}_{(\cdot) \rightarrow \hat{y}}^R$ quantifies redundant information shared among multiple predictors $(\cdot)$ about target $\hat{y}$, while $\Delta \mathrm{I}_{(\cdot) \rightarrow \hat{y}}^U$ and $\Delta \mathrm{I}_{(\cdot) \rightarrow \hat{y}}^S$ represent the unique and synergistic causal contributions, respectively, from the predictors to the target variable.
  • ...and 4 more figures