Toward generative machine learning for boosting ensembles of climate simulations

Parsa Gooya; Reinel Sospedra-Alfonso; Johannes Exenberger

Toward generative machine learning for boosting ensembles of climate simulations

Parsa Gooya, Reinel Sospedra-Alfonso, Johannes Exenberger

TL;DR

This paper addresses quantifying uncertainty from internal climate variability under computational constraints by training a conditional variational autoencoder (cVAE) to generate large TAS ensembles conditioned on a low-dimensional trend; the cVAE encodes data with $q_phi(z|x,c) ~ N(mu, Sigma)$ and a prior $p(z|c)=p(z) ~ N(0,I)$ and decodes via $p_theta(x|z,c) ~ N(mu_theta(z,c), sigma^2 I)$, with decoder noise to restore multiscale variability; results show boosted ensembles reproduce extremes and ENSO teleconnections even when trained on a single CanESM5 member; the method offers a transparent, computationally efficient path for post-processing climate ensembles, with future work on higher resolution, multi-variable ensembles, and time-dependent extensions.

Abstract

Accurately quantifying uncertainty in predictions and projections arising from irreducible internal climate variability is critical for informed decision making. Such uncertainty is typically assessed using ensembles produced with physics based climate models. However, computational constraints impose a trade off between generating the large ensembles required for robust uncertainty estimation and increasing model resolution to better capture fine scale dynamics. Generative machine learning offers a promising pathway to alleviate these constraints. We develop a conditional Variational Autoencoder (cVAE) trained on a limited sample of climate simulations to generate arbitrary large ensembles. The approach is applied to output from monthly CMIP6 historical and future scenario experiments produced with the Canadian Centre for Climate Modelling and Analysis' (CCCma's) Earth system model CanESM5. We show that the cVAE model learns the underlying distribution of the data and generates physically consistent samples that reproduce realistic low and high moment statistics, including extremes. Compared with more sophisticated generative architectures, cVAEs offer a mathematically transparent, interpretable, and computationally efficient framework. Their simplicity lead to some limitations, such as overly smooth outputs, spectral bias, and underdispersion, that we discuss along with strategies to mitigate them. Specifically, we show that incorporating output noise improves the representation of climate relevant multiscale variability, and we propose a simple method to achieve this. Finally, we show that cVAE-enhanced ensembles capture realistic global teleconnection patterns, even under climate conditions absent from the training data.

Toward generative machine learning for boosting ensembles of climate simulations

TL;DR

and a prior

and decodes via

, with decoder noise to restore multiscale variability; results show boosted ensembles reproduce extremes and ENSO teleconnections even when trained on a single CanESM5 member; the method offers a transparent, computationally efficient path for post-processing climate ensembles, with future work on higher resolution, multi-variable ensembles, and time-dependent extensions.

Abstract

Paper Structure (11 sections, 8 equations, 5 figures)

This paper contains 11 sections, 8 equations, 5 figures.

Introduction
Data and Methods
Generative model
Data and preprocessing
Model and training
Inference
Evaluation of cVAE with and without decoder noise
Results and discussions
Regional statistics
Global structure and teleconnections
Summary and conclusions

Figures (5)

Figure 1: a) QQ plots of TAS anomalies relative to the seasonal climatology pooling all grid cells and times. b, c) Spectral power for DJF and JJA mean TAS anomalies averaged over years and realizations. d, e, f) standard deviation of TAS fields per grid cell across all times and realizations for different datasets. The numbers in the title of the panels are RMSE (pattern correlation) relative to the population on panel (d). All panels cover 1980-2020 period.
Figure 2: a, c, e, g) QQ plots of TAS anomalies relative to the seasonal climatology at different cities. b, d, f, h) PDF histograms on log scale for the same cities.
Figure 3: a, b) Maps of skewness (c), (d) kurtosis and (e), (f) %0.99 percentile of TAS anomalies for population (top row) and VAE+DN (bottom row)
Figure 4: a) QQ plots (b) PDF and (c) seasonal cycle of variability in Niño3.4 index for different data products
Figure 5: Composites of the linearly detrended monthly TAS anomalies for El-Niño events classified using the $75^{th}$, $85^{th}$ and absent from training percentiles based on the Niño3.4 index (rows). First column is the population dataset and the second column is the boosted ensemble from VAE+DN model.

Toward generative machine learning for boosting ensembles of climate simulations

TL;DR

Abstract

Toward generative machine learning for boosting ensembles of climate simulations

Authors

TL;DR

Abstract

Table of Contents

Figures (5)