Toward generative machine learning for boosting ensembles of climate simulations
Parsa Gooya, Reinel Sospedra-Alfonso, Johannes Exenberger
TL;DR
This paper addresses quantifying uncertainty from internal climate variability under computational constraints by training a conditional variational autoencoder (cVAE) to generate large TAS ensembles conditioned on a low-dimensional trend; the cVAE encodes data with $q_phi(z|x,c) ~ N(mu, Sigma)$ and a prior $p(z|c)=p(z) ~ N(0,I)$ and decodes via $p_theta(x|z,c) ~ N(mu_theta(z,c), sigma^2 I)$, with decoder noise to restore multiscale variability; results show boosted ensembles reproduce extremes and ENSO teleconnections even when trained on a single CanESM5 member; the method offers a transparent, computationally efficient path for post-processing climate ensembles, with future work on higher resolution, multi-variable ensembles, and time-dependent extensions.
Abstract
Accurately quantifying uncertainty in predictions and projections arising from irreducible internal climate variability is critical for informed decision making. Such uncertainty is typically assessed using ensembles produced with physics based climate models. However, computational constraints impose a trade off between generating the large ensembles required for robust uncertainty estimation and increasing model resolution to better capture fine scale dynamics. Generative machine learning offers a promising pathway to alleviate these constraints. We develop a conditional Variational Autoencoder (cVAE) trained on a limited sample of climate simulations to generate arbitrary large ensembles. The approach is applied to output from monthly CMIP6 historical and future scenario experiments produced with the Canadian Centre for Climate Modelling and Analysis' (CCCma's) Earth system model CanESM5. We show that the cVAE model learns the underlying distribution of the data and generates physically consistent samples that reproduce realistic low and high moment statistics, including extremes. Compared with more sophisticated generative architectures, cVAEs offer a mathematically transparent, interpretable, and computationally efficient framework. Their simplicity lead to some limitations, such as overly smooth outputs, spectral bias, and underdispersion, that we discuss along with strategies to mitigate them. Specifically, we show that incorporating output noise improves the representation of climate relevant multiscale variability, and we propose a simple method to achieve this. Finally, we show that cVAE-enhanced ensembles capture realistic global teleconnection patterns, even under climate conditions absent from the training data.
