Table of Contents
Fetching ...

Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model

Helge Heuer, Tom Beucler, Mierk Schwabe, Julien Savre, Manuel Schlund, Veronika Eyring

TL;DR

This work tackles the instability of hybrid AI–climate models by transferring a ClimSim-trained BiLSTM convection parameterization to the ICON-A model and making its use tunable via confidence-guided mixing with a conventional convection scheme. The authors implement a physics-informed, uncertainty-aware loss and add pretraining-time input noise to improve long-term stability, enabling stable year-long and 20-year AMIP-style simulations. Key contributions include a rigorous data preprocessing step to isolate convection, a two-head network that estimates predictive uncertainty, and a mixing strategy that reduces extrapolation risk while remaining interpretable across environmental regimes. The findings show that the mixed, physics-informed approach can outperform the baseline Tiedtke parameterization in several observational benchmarks, while preserving conservation and achieving robust long-term stability, with practical implications for deploying ML-enhanced parameterizations in operationally relevant climate models.

Abstract

Persistent systematic errors in Earth system models (ESMs) arise from difficulties in representing the full diversity of subgrid, multiscale atmospheric convection and turbulence. Machine learning (ML) parameterizations trained on short high-resolution simulations show strong potential to reduce these errors. However, stable long-term atmospheric simulations with hybrid (physics + ML) ESMs remain difficult, as neural networks (NNs) trained offline often destabilize online runs. Training convection parameterizations directly on coarse-grained data is challenging, notably because scales cannot be cleanly separated. This issue is mitigated using data from superparameterized simulations, which provide clearer scale separation. Yet, transferring a parameterization from one ESM to another remains difficult due to distribution shifts that induce large inference errors. Here, we present a proof-of-concept where a ClimSim-trained, physics-informed NN convection parameterization is successfully transferred to ICON-A. The scheme is (a) trained on adjusted ClimSim data with subtracted radiative tendencies, and (b) integrated into ICON-A. The NN parameterization predicts its own error, enabling mixing with a conventional convection scheme when confidence is low, thus making the hybrid AI-physics model tunable with respect to observations and reanalysis through mixing parameters. This improves process understanding by constraining convective tendencies across column water vapor, lower-tropospheric stability, and geographical conditions, yielding interpretable regime behavior. In AMIP-style setups, several hybrid configurations outperform the default convection scheme (e.g., improved precipitation statistics). With additive input noise during training, both hybrid and pure-ML schemes lead to stable simulations and remain physically consistent for at least 20 years.

Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model

TL;DR

This work tackles the instability of hybrid AI–climate models by transferring a ClimSim-trained BiLSTM convection parameterization to the ICON-A model and making its use tunable via confidence-guided mixing with a conventional convection scheme. The authors implement a physics-informed, uncertainty-aware loss and add pretraining-time input noise to improve long-term stability, enabling stable year-long and 20-year AMIP-style simulations. Key contributions include a rigorous data preprocessing step to isolate convection, a two-head network that estimates predictive uncertainty, and a mixing strategy that reduces extrapolation risk while remaining interpretable across environmental regimes. The findings show that the mixed, physics-informed approach can outperform the baseline Tiedtke parameterization in several observational benchmarks, while preserving conservation and achieving robust long-term stability, with practical implications for deploying ML-enhanced parameterizations in operationally relevant climate models.

Abstract

Persistent systematic errors in Earth system models (ESMs) arise from difficulties in representing the full diversity of subgrid, multiscale atmospheric convection and turbulence. Machine learning (ML) parameterizations trained on short high-resolution simulations show strong potential to reduce these errors. However, stable long-term atmospheric simulations with hybrid (physics + ML) ESMs remain difficult, as neural networks (NNs) trained offline often destabilize online runs. Training convection parameterizations directly on coarse-grained data is challenging, notably because scales cannot be cleanly separated. This issue is mitigated using data from superparameterized simulations, which provide clearer scale separation. Yet, transferring a parameterization from one ESM to another remains difficult due to distribution shifts that induce large inference errors. Here, we present a proof-of-concept where a ClimSim-trained, physics-informed NN convection parameterization is successfully transferred to ICON-A. The scheme is (a) trained on adjusted ClimSim data with subtracted radiative tendencies, and (b) integrated into ICON-A. The NN parameterization predicts its own error, enabling mixing with a conventional convection scheme when confidence is low, thus making the hybrid AI-physics model tunable with respect to observations and reanalysis through mixing parameters. This improves process understanding by constraining convective tendencies across column water vapor, lower-tropospheric stability, and geographical conditions, yielding interpretable regime behavior. In AMIP-style setups, several hybrid configurations outperform the default convection scheme (e.g., improved precipitation statistics). With additive input noise during training, both hybrid and pure-ML schemes lead to stable simulations and remain physically consistent for at least 20 years.

Paper Structure

This paper contains 28 sections, 22 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Overall training and evaluation pipeline of our hybrid model. $x$ and $y$ represent inputs and outputs of the ClimSim dataset, based on the E3SM-MMF model. $\dot{T}_{tot}$ is the total temperature tendency, and "RTE+RRTMGP" the ICON radiation scheme. The ClimSim dataset is first modified to separate radiative and convective subgrid tendencies, forming a new dataset, "ClimSim Convection". Afterward, we trained a BiLSTM model including a confidence loss (CL). Using CL, this model is mixed with the conventional "Tiedtke" cumulus convection scheme to predict convective tendencies as well as precipitation. In the mixing process, $\lambda$ represents the fraction provided by the BiLSTM and ${1-\lambda}$ is the fraction from the conventional "Tiedtke" scheme, respectively. This mixed scheme predicts the tendencies due to convection in temperature $\dot{T}_{conv}$, water vapor, cloud liquid water, cloud ice ($\dot{q}_{\alpha,conv}$, $\alpha={v,l,i}$), zonal wind $\dot{u}_{conv}$, and meridional wind $\dot{v}_{conv}$. Finally, we coupled the mixed scheme with the ICON model and evaluate these runs' emergent statistics with respect to observational datasets, including ERA5 and GPCP.
  • Figure 2: The BiLSTM architecture developed by the 5 place Kaggle competition winner "YA HB MS EK", and used in the work presented in this article. Tensor dimensions are visualized in the lower right corner of the individual layers. The tensor dimensions shown in the figure are the batch dimension $b$, the column height level dimension $l$, the input dimension $i$, the encoding dimension $e$, hidden dimension $h$, iter dimension $it$, output scalar dimension $s$, and the output profile dimension $p$. In the blue-marked layers, the horizontal dotted lines indicate operations restricted to the last dimension, thereby preserving "vertical locality".
  • Figure 3: ML weight $\lambda$ as function of the predicted error percentile level. The tuning parameters $p_0$ and $p_1$ (here 20 and 60) are marked by dashed and dotted lines, respectively. In blue and with slanted hatching, the area with $\lambda=1$ (pure ML) is shown. $\lambda=0$ (pure Tiedtke) is shown in orange and with horizontal hatching.
  • Figure 4: Offline skill-complexity plane for various combinations of nine chosen hyperparameters of the BiLSTM on a smaller subset of the dataset with 3million training and 1.5million validation samples. The red dashed line shows the Pareto Front between the coefficient of determination $R^2$ and the number of Multiply-Accumulate Operations (MACs). The highlighted NN is selected for the remainder of this study because it strikes a suitable balance between skill and computational performance.
  • Figure 5: Evaluation scores for coupled ICON runs, each dot represents a one-year long coupled ICON run at a horizontal resolution of $\qty{158}{\kilo\meter}\times\qty{158}{\kilo\meter}$. The runs are colored according to their physics-informed loss weight $\alpha$ for the coupled ML schemes and the conventional Tiedtke scheme is colored in blue. Within each coloring group, the models have different values for $p_0$ and $p_1$. Panel (a) shows the spatial $R^2$ score of precipitation with respect to the observational dataset GPCP versus the $R^2$ score of column water vapor (CWV) with respect to the mean of multiple observation sets as explained in \ref{['sec:data_eval']}. Panel (b) displays the $R^2$ score of near-surface (2) air temperature with respect to ERA5 versus the RMSE of zonal mean precipitation with respect to GPCP. In both panels, the Pareto front between the two skill metrics is marked with a dashed red line.
  • ...and 14 more figures