Table of Contents
Fetching ...

Causal Climate Emulation with Bayesian Filtering

Sebastian Hickman, Ilija Trajkovic, Julia Kaltenborn, Francis Pelletier, Alex Archibald, Yaniv Gurwicz, Peer Nowack, David Rolnick, Julien Boussard

TL;DR

This work introduces PICABU, a physics-informed causal emulator for climate dynamics that learns a latent causal graph over region-like climate modes and a latent-to-observation mapping under a single-parent decoding constraint. It combines causal representation learning with an augmented ELBO objective and spectral/invariant losses, and employs a Bayesian filter for stable long-term autoregressive rollouts, enabling uncertainty-aware projections and counterfactual analysis. Evaluations on synthetic SAVAR data and real climate-model datasets (NorESM2 and CESM2-FV2) show accurate reproduction of key climate variability (e.g., ENSO, GMST) and improved generalization under distribution shifts, while ablations highlight the importance of CRPS and spectral losses for real data. The work demonstrates the capacity for causal attribution of extreme events and offers a principled pathway toward trustworthy, physically-consistent climate emulators with interpretable interventions, albeit with limitations from the single-parent assumption and high latent dimensionality.

Abstract

Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physically-based causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a novel approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.

Causal Climate Emulation with Bayesian Filtering

TL;DR

This work introduces PICABU, a physics-informed causal emulator for climate dynamics that learns a latent causal graph over region-like climate modes and a latent-to-observation mapping under a single-parent decoding constraint. It combines causal representation learning with an augmented ELBO objective and spectral/invariant losses, and employs a Bayesian filter for stable long-term autoregressive rollouts, enabling uncertainty-aware projections and counterfactual analysis. Evaluations on synthetic SAVAR data and real climate-model datasets (NorESM2 and CESM2-FV2) show accurate reproduction of key climate variability (e.g., ENSO, GMST) and improved generalization under distribution shifts, while ablations highlight the importance of CRPS and spectral losses for real data. The work demonstrates the capacity for causal attribution of extreme events and offers a principled pathway toward trustworthy, physically-consistent climate emulators with interpretable interventions, albeit with limitations from the single-parent assumption and high latent dimensionality.

Abstract

Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physically-based causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a novel approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.

Paper Structure

This paper contains 40 sections, 16 equations, 26 figures, 7 tables, 1 algorithm.

Figures (26)

  • Figure 1: High-level schematic of the PICABU pipeline. The key features of the PICABU pipeline are illustrated: the latent embeddings under the single-parent assumption, the learned causal graph over these latents, the loss used during training, the Bayesian filter allowing for stable autoregressive rollouts, and the possibility of counterfactual experiments.
  • Figure 2: An example next timestep PICABU prediction. A) normalized temperature of NorESM2 (target), B) prediction from PICABU for the target month, C) difference between target and prediction. All data is on an icosahedral grid, normalized, and deseasonalized.
  • Figure 3: PICABU learns accurate temporal variability for ENSO, and outperforms other methods in learning GMST variability. We run PICABU for ten different 50-year emulations, and compute the mean and standard deviation of the spectra, doing the same for ten different 50-year periods of NorESM2 data. On the left, the power spectra for the Niño3.4 index, for PICABU, the ViT, V-PCMCI, and ground truth data (NorESM2) are shown. The same is shown for GMST on the right.
  • Figure 4: PICABU outperforms ablated models in learning ENSO and GMST variability. We run PICABU for ten different 50-year emulations, and compute the mean and standard deviation of the spectra, doing the same for ten different 50-year periods of NorESM2 data. On the left, the power spectra for the Niño3.4 index, for PICABU and ablations, and ground truth data (NorESM2) are shown. On the right, the same is shown for GMST.
  • Figure 5: Intervention on the grid cells that describe the ENSO state. The left panel shows mapping from latents to observations (colored grid points), and the intervened grid cells. We increase their value, which then influences the latents at the current and next timestep through the learned causal graph. No other grid cells or latent variables are intervened on. The middle panel shows the original, unintervened next-step prediction, and the right panel shows the intervened prediction.
  • ...and 21 more figures