Technical Report: Towards Unified Diffusion Models for Multi-Model Climate Emulation at Scale
Francesco Immorlano, Elijah Tavares, Felix Draxler, Padhraic Smyth, Pierre Gentine, Stephan Mandt
TL;DR
The work tackles the computational bottleneck of forming large climate ensembles by introducing a unified conditional diffusion model that jointly emulates nine CMIP6 models across three SSP scenarios. It conditions on model identity $m$, CO$_2$e $c_s$, day $d$, and year $y$ to generate daily global temperature maps via the conditional distribution $P(T\mid m,c_s,d,y)$, enabling scalable probabilistic sampling and cross-model comparisons. Key contributions include (i) efficient probabilistic sampling for uncertainty quantification across models and scenarios, (ii) orders-of-magnitude speedups over traditional climate simulations, and (iii) variance-reduced treatment effect estimation using paired seeds that dramatically reduce the sample size needed for precise causal inferences. The approach generalizes to unseen futures, supports rapid policy-scenario exploration at regional scales, and offers a practical tool for impact assessment with well-calibrated full distributions, not just mean trajectories.
Abstract
Large ensembles of climate projections are essential for characterizing uncertainty in future climate and extreme weather events, yet computational constraints of numerical climate models limit ensemble sizes to a small number of realizations per model. We present a unified conditional diffusion model that dramatically reduces this computational barrier by learning shared distributional patterns across multiple Coupled Model Intercomparison Project phase 6 models and emission scenarios. Rather than training separate emulators for each model-scenario combination, our approach captures the common statistical structures underlying nine CMIP6 models, generating daily temperature maps with a global coverage for historical and future periods. This unified framework enables: (i) efficient probabilistic sampling for comprehensive uncertainty quantification across models and scenarios; (ii) rapid generation of large ensembles that would be computationally intractable with traditional climate models; (iii) variance-reduced treatment effect analysis via fixed-seed generation that disentangles forced climate responses from internal variability. Evaluations on held-out models demonstrate reliable generalization to unseen future climates, enabling rapid exploration of different emission pathways.
