Towards Physically Consistent Deep Learning For Climate Model Parameterizations

Birgit Kühbacher; Fernando Iglesias-Suarez; Niki Kilbertus; Veronika Eyring

Towards Physically Consistent Deep Learning For Climate Model Parameterizations

Birgit Kühbacher, Fernando Iglesias-Suarez, Niki Kilbertus, Veronika Eyring

TL;DR

This work tackles the challenge of physically inconsistent and uninterpretable DL-based climate model parameterizations by introducing PCMasking, a data-driven framework that first uncovers physical input drivers through a sparsity-promoting pre-masking phase and then enforces physical consistency via a thresholded input mask during a masking fine-tuning phase. The method maintains predictive performance comparable to causally-informed baselines while substantially reducing computational overhead, as demonstrated on SPCAM aquaplanet data with one model per output variable. SHAP analyses show that PCMasking suppresses non-physical, long-range input–output links and concentrates on plausible local drivers, improving interpretability. Cross-climate experiments indicate that the framework identifies robust physical drivers across climates, though generalization remains an area for future improvement. Overall, PCMasking advances data-driven climate parameterizations by combining physical driver selection with automatic, architecture-friendly training, offering practical benefits for scalable ensemble forecasting and deeper physical insight.

Abstract

Climate models play a critical role in understanding and projecting climate change. Due to their complexity, their horizontal resolution of about 40-100 km remains too coarse to resolve processes such as clouds and convection, which need to be approximated via parameterizations. These parameterizations are a major source of systematic errors and large uncertainties in climate projections. Deep learning (DL)-based parameterizations, trained on data from computationally expensive short, high-resolution simulations, have shown great promise for improving climate models in that regard. However, their lack of interpretability and tendency to learn spurious non-physical correlations result in reduced trust in the climate simulation. We propose an efficient supervised learning framework for DL-based parameterizations that leads to physically consistent models with improved interpretability and negligible computational overhead compared to standard supervised training. First, key features determining the target physical processes are uncovered. Subsequently, the neural network is fine-tuned using only those relevant features. We show empirically that our method robustly identifies a small subset of the inputs as actual physical drivers, therefore removing spurious non-physical relationships. This results in by design physically consistent and interpretable neural networks while maintaining the predictive performance of unconstrained black-box DL-based parameterizations.

Towards Physically Consistent Deep Learning For Climate Model Parameterizations

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 26 figures, 3 tables)

This paper contains 15 sections, 3 equations, 26 figures, 3 tables.

Introduction
Related Work
PCMasking Framework
Initial Training and Finding Physical Relationships
Masking Vector Extraction and Thresholding
Physically Consistent Masking and Fine-tuning
Experiments
SPCAM and Neural Network Configuration
SPCAM Data
Neural Network Configuration
Experimental Results
Offline Predictive Performance
Physical Consistency and Interpretability
Evaluation on Different Climates
Conclusion

Figures (26)

Figure 1: Schematic of a neural network model in the PCMasking framework. The blue line indicates the path of the input vector in the pre-masking phase, where it passes through a conventional dense input layer with weight matrix $W_1$. In the masking phase, as illustrated by the red line, the input vector is element-wise multiplied with a masking vector. In both cases, the information then flows through an arbitrary network architecture before reaching the linear output layer with weight matrix $W_M$.
Figure 2: Pressure-latitude cross-sections for heating tendencies $\Delta T_{phy}$ computed from 1440 test data samples. One neural network is trained for each of the 30 vertical levels to construct a full vertical profile (y-axis). Each network predicts heating tendencies across the entire globe (x-axis). The left and middle plots illustrate that the PCMasking framework networks accurately predict the true SPCAM values. The right plot depicts the $R^2$ score (higher is better, maximum 1). While the $R^2$ score is high at around 600 hPa in some regions at the equator and the mid-latitudes, the predictive performance declines in the lower troposphere (around 700-1000 hPa). This is likely due to turbulent and stochastic processes in the planetary boundary layer. See SI Fig. \ref{['fig:si_phq_cross_section']} for results for moistening tendencies.
Figure 3: Vertical profiles for heating tendencies $\Delta T_{phy}$ computed from 1440 test data samples. One neural network is trained for each of the 30 vertical levels to construct a full vertical profile (y-axis). The network predictions are horizontally averaged across latitudes. The predictions from the PCMasking framework (PCM) and the causally-informed NNs iglesias-suarez2024 (CI-NN) are shown on the left alongside the true SPCAM values. Both network types accurately reproduce the true profile. The right plot depicts the $R^2$ score (higher is better, maximum 1). The noticeable decline in the lower troposphere (around 700-1000 hPa) is likely due to turbulent and stochastic processes in the planetary boundary layer. See SI Fig. \ref{['fig:si_phq_profile']} for results for moistening tendencies.
Figure 4: Mean absolute SHAP values computed from 1000 samples for standard feed-forward neural networks (NNs) (left), causally-informed NNs iglesias-suarez2024 (CI-NN) (middle) and PCMasking framework NNs (right). For clarity, we have only included 3D input and output variables (see SI Fig. \ref{['fig:si_shap_values']} for SHAP plots including 2D variables). The standard NNs display numerous spurious connections between input and output variables. This is particularly evident for the input temperature, where inputs in the upper troposphere (around 100-300 hPa) and stratosphere (above 100 hPa) impact outputs in the mid to lower troposphere (around 300-1000 hPa). Such spurious, non-physical links are removed in both the causally-informed NNs and the PCMasking framework. Furthermore, the total number of inputs (right line plot) for CI-NN and the PCMasking framework indicates that the PCMasking framework is more physically accurate as it does not detect any inputs for moistening tendencies in the stratosphere, where the air is cold and dry.
Figure 5: SHAP value difference between mean absolute SHAP values for standard neural networks (NNs) and PCMasking framework networks. SHAP values were computed from 1000 samples, and only 3D variables are displayed for clarity (see SI Fig. \ref{['fig:si_shap_diff']} for SHAP difference including 2D variables). Red areas indicate positive values, meaning that these input-output links were more pronounced in the standard NNs. Negative, blue values indicate these connections are more prominent in the PCMasking framework. The PCMasking framework clearly emphasizes physically consistent local interactions along the diagonal and non-local interactions in the lower troposphere (around 700-1000 hPa).
...and 21 more figures

Towards Physically Consistent Deep Learning For Climate Model Parameterizations

TL;DR

Abstract

Towards Physically Consistent Deep Learning For Climate Model Parameterizations

Authors

TL;DR

Abstract

Table of Contents

Figures (26)