Score-based generative emulation of impact-relevant Earth system model outputs
Shahine Bouabid, Andre Nogueira Souza, Raffaele Ferrari
TL;DR
This work develops a score-based diffusion emulator trained on CMIP6 outputs to generate joint distributions of monthly near-surface climate fields conditioned on GMST anomalies, implemented on a spherical HEALPix mesh and paired with pattern-scaling inputs. It demonstrates the emulator’s ability to reproduce ESM distributions for four impact-relevant variables, capturing internal variability, forcing responses, cross-variable correlations, and extreme tails, while identifying notable failure modes in seasonally shifting distributions. The approach offers a computationally efficient surrogate that can be integrated into impact assessment workflows, with potential for daily resolution, finer spatial scales, and bias-aware transfer learning. Overall, the paper presents a practical, scalable framework for rapid climate-projection surrogacy that aligns with ISIMIP priorities and CMIP6-based futures, while outlining concrete improvements for broader applicability.
Abstract
Policy targets evolve faster than the Couple Model Intercomparison Project cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model output emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. We focus on emulators designed to provide inputs to impact models. Using monthly ESM fields of near-surface temperature, precipitation, relative humidity, and wind speed, we show that deep generative models have the potential to model jointly the distribution of variables relevant for impacts. The specific model we propose uses score-based diffusion on a spherical mesh and runs on a single mid-range graphical processing unit. We introduce a thorough suite of diagnostics to compare emulator outputs with their parent ESMs, including their probability densities, cross-variable correlations, time of emergence, or tail behavior. We evaluate performance across three distinct ESMs in both pre-industrial and forced regimes. The results show that the emulator produces distributions that closely match the ESM outputs and captures key forced responses. They also reveal important failure cases, notably for variables with a strong regime shift in the seasonal cycle. Although not a perfect match to the ESM, the inaccuracies of the emulator are small relative to the scale of internal variability in ESM projections. We therefore argue that it shows potential to be useful in supporting impact assessment. We discuss priorities for future development toward daily resolution, finer spatial scales, and bias-aware training. Code is made available at https://github.com/shahineb/climemu.
