Causal Climate Emulation with Bayesian Filtering
Sebastian Hickman, Ilija Trajkovic, Julia Kaltenborn, Francis Pelletier, Alex Archibald, Yaniv Gurwicz, Peer Nowack, David Rolnick, Julien Boussard
TL;DR
This work introduces PICABU, a physics-informed causal emulator for climate dynamics that learns a latent causal graph over region-like climate modes and a latent-to-observation mapping under a single-parent decoding constraint. It combines causal representation learning with an augmented ELBO objective and spectral/invariant losses, and employs a Bayesian filter for stable long-term autoregressive rollouts, enabling uncertainty-aware projections and counterfactual analysis. Evaluations on synthetic SAVAR data and real climate-model datasets (NorESM2 and CESM2-FV2) show accurate reproduction of key climate variability (e.g., ENSO, GMST) and improved generalization under distribution shifts, while ablations highlight the importance of CRPS and spectral losses for real data. The work demonstrates the capacity for causal attribution of extreme events and offers a principled pathway toward trustworthy, physically-consistent climate emulators with interpretable interventions, albeit with limitations from the single-parent assumption and high latent dimensionality.
Abstract
Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physically-based causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a novel approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.
