Table of Contents
Fetching ...

On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed

Hamid A. Pahlavan, Pedram Hassanzadeh, M. Joan Alexander

TL;DR

Data-driven parameterizations in climate modeling often exhibit online instability despite strong offline metrics, due to unlearned non-local dynamics. Using a 1D gravity-wave–QBO testbed, the authors show that stability hinges on the network's receptive field (RF) being large enough to capture vertical non-local coupling; architectures with insufficient RF (e.g., small-RF CNNs) become unstable, while large-RF CNNs, Fourier neural operators (FNO), and MLPs stabilize the simulation and reproduce the true QBO period ($\tau \approx 28.7$ months) with realistic variability, as opposed to offline RMSE/$R^2$ alone. They introduce and apply effective receptive field (ERF) analyses to predict online stability a priori and demonstrate non-local dynamics are crucial for both SGS parameterizations and spatiotemporal emulators. The work argues for integrating ML theory with climate physics to guide the design of robust, non-locality-aware data-driven models, with broad implications for SGS schemes and climate emulation.

Abstract

Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially $\textit{non-local}$ dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the concept of receptive field (RF) can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles ($\mathbf{R^2 \approx 0.99}$) cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. We examine three broad classes of architectures, namely convolutional NNs, Fourier neural operators, and fully-connected NNs; the latter two have inherently large RFs. We also demonstrate that learning non-local dynamics is crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the necessity of integrating ML theory with physics to design and analyze data-driven algorithms for weather and climate modeling.

On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed

TL;DR

Data-driven parameterizations in climate modeling often exhibit online instability despite strong offline metrics, due to unlearned non-local dynamics. Using a 1D gravity-wave–QBO testbed, the authors show that stability hinges on the network's receptive field (RF) being large enough to capture vertical non-local coupling; architectures with insufficient RF (e.g., small-RF CNNs) become unstable, while large-RF CNNs, Fourier neural operators (FNO), and MLPs stabilize the simulation and reproduce the true QBO period ( months) with realistic variability, as opposed to offline RMSE/ alone. They introduce and apply effective receptive field (ERF) analyses to predict online stability a priori and demonstrate non-local dynamics are crucial for both SGS parameterizations and spatiotemporal emulators. The work argues for integrating ML theory with climate physics to guide the design of robust, non-locality-aware data-driven models, with broad implications for SGS schemes and climate emulation.

Abstract

Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the concept of receptive field (RF) can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles () cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. We examine three broad classes of architectures, namely convolutional NNs, Fourier neural operators, and fully-connected NNs; the latter two have inherently large RFs. We also demonstrate that learning non-local dynamics is crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the necessity of integrating ML theory with physics to design and analyze data-driven algorithms for weather and climate modeling.
Paper Structure (12 sections, 5 equations, 9 figures, 1 table)

This paper contains 12 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Quasi-biennial oscillation (QBO), as seen in a time-height section of zonal wind. (A) The "true" QBO in the 1D model with a mean period ($\tau$) of 28.7 months, and a standard deviation of 0.7 months. Online performance of the different NN architectures that predict GW drag $G$ as a function of zonal wind $u$: (B and C) CNNs with RF of 19 and 55, respectively, and (D) FNO. Each panel displays only 20 years of a 1000-year simulation for illustration.
  • Figure 2: The RF of CNNs and its relation to their performance. (A) Schematic of the RF for an output in the middle of the vertical domain for various layers of a CNN with a kernel size of seven. The output at the final layer (i.e., GW drag at the 26 km level) can see only 19 levels of the input (i.e., zonal wind profile). For a vertical resolution of 500 m, it covers 9.5 km of the input domain, ranging from 21.5 to 31 km. (B) Online stability of CNNs with varying RFs for different resolutions (number of levels) of the 1D model. The RF of a CNN can be increased by adding layers, enlarging the kernel size, or expanding the dilation, as indicated. (C) Similar to B, but showing the RMSE of CNNs instead of RF.
  • Figure 3: Correlation matrices highlight the non-locality of the GW dynamics. (A and B) Correlation between zonal wind ($u$) and GW drag ($G$), and (C and D) between the vertical shear of zonal wind ($\partial{u}/\partial{z}$) and $G$ in the 1D model, and the observed QBO based on the ERA5 reanalysis data (2010-2019), as indicated. For ERA5, $u$ is the zonal-mean zonal wind averaged over 5$^{\circ}$N–5$^{\circ}$S, and $G$ is the zonal wave forcing (divergence of the upward flux of zonal momentum). Note that $G$ in ERA5 is due to resolved waves, primarily GWs but also Kelvin waves.
  • Figure 4: Offline error of various NN architectures. (A) Vertical profile of error for different NN architectures as indicated. (B-D) Difference between the true and NN-predicted correlation matrices for zonal wind and GW drag, using CNN(19), CNN(55), and FNO, respectively. This difference is the NN-predicted correlation matrix (not shown) subtracted from the true correlation matrix shown in Fig. \ref{['fig. 3']}A. The result for the MLP ($R^2=0.999$, RMSE=0.007 [m/s/day]) is not shown, but it is similar to (C and D) showing no significant difference from the true correlation matrix.
  • Figure 5: Effective receptive field (ERF) for GW drag at levels (A) 18 km, (B) 34 km, (C) 26 km, and (D) 30 km (marked by dashed horizontal grey lines) for various NN architectures, as indicated. The ERF is calculated by back-propagating an arbitrary gradient from the desired level of output back to the input.
  • ...and 4 more figures