Table of Contents
Fetching ...

Fourier analysis of the physics of transfer learning for data-driven subgrid-scale models of ocean turbulence

Moein Darman, Pedram Hassanzadeh, Laure Zanna, Ashesh Chattopadhyay

TL;DR

This work tackles the generalization problem of data-driven subgrid-scale parameterizations in ocean turbulence by using a 9-layer CNN to map coarse velocity fields to subgrid forcing in a two-layer quasi-geostrophic model. Through Fourier-space analysis of CNN kernels, the authors show that learned filters are primarily low-pass, high-pass, or Gabor-like, and that out-of-distribution data cause underestimation of activation spectra in early layers, degrading spectral fidelity. Transfer learning, achieved by retraining only a single layer, realigns activation spectra and output spectra to match target systems, enabling effective generalization with far less target data. The study provides a mechanistic link between kernel spectra and learned physics, offering a broadly applicable framework for efficient, interpretable data-driven SGS parameterizations across isotropic and anisotropic flows in multi-scale dynamical systems.

Abstract

Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling. TL enables models to generalize to out-of-distribution data with minimal training data from the new system. In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system and examine which metrics best describe its performance and generalizability to unseen dynamical regimes. Fourier analysis of the NN kernels reveals that they learn low-pass, Gabor, and high-pass filters, regardless of whether the training data are isotropic or anisotropic. By analyzing the activation spectra, we identify why NNs fail to generalize without TL and how TL can overcome these limitations: the learned weights and biases from one dataset underestimate the out-of-distribution sample spectra as they pass through the network, leading to an underestimation of output spectra. By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra. These findings are broadly applicable to data-driven parameterization of dynamical systems.

Fourier analysis of the physics of transfer learning for data-driven subgrid-scale models of ocean turbulence

TL;DR

This work tackles the generalization problem of data-driven subgrid-scale parameterizations in ocean turbulence by using a 9-layer CNN to map coarse velocity fields to subgrid forcing in a two-layer quasi-geostrophic model. Through Fourier-space analysis of CNN kernels, the authors show that learned filters are primarily low-pass, high-pass, or Gabor-like, and that out-of-distribution data cause underestimation of activation spectra in early layers, degrading spectral fidelity. Transfer learning, achieved by retraining only a single layer, realigns activation spectra and output spectra to match target systems, enabling effective generalization with far less target data. The study provides a mechanistic link between kernel spectra and learned physics, offering a broadly applicable framework for efficient, interpretable data-driven SGS parameterizations across isotropic and anisotropic flows in multi-scale dynamical systems.

Abstract

Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling. TL enables models to generalize to out-of-distribution data with minimal training data from the new system. In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system and examine which metrics best describe its performance and generalizability to unseen dynamical regimes. Fourier analysis of the NN kernels reveals that they learn low-pass, Gabor, and high-pass filters, regardless of whether the training data are isotropic or anisotropic. By analyzing the activation spectra, we identify why NNs fail to generalize without TL and how TL can overcome these limitations: the learned weights and biases from one dataset underestimate the out-of-distribution sample spectra as they pass through the network, leading to an underestimation of output spectra. By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra. These findings are broadly applicable to data-driven parameterization of dynamical systems.

Paper Structure

This paper contains 22 sections, 15 equations, 6 figures.

Figures (6)

  • Figure 1: Row a displays the schematic of the CNN and inputs and outputs in physical space. Each TLNN is initialized with the weights of $\text{BNN}^{0}$, and only the first hidden layer ($\ell = 2$) is re-trained using a smaller percentage of data. The inputs are the meridional and zonal velocities of the upper and lower levels, and the output is the subgrid forcing for each level. Row b shows the inputs and outputs of the CNN in spectral space, with the spectrum meridionally averaged
  • Figure 2: Comparative analysis of the base system and three target configurations using 10 years of simulation data. Row a: Snapshots of potential vorticity showcasing the spatial distribution of eddies in each system—eddies display a roughly isotropic structure, while jets exhibit more organized zonal alignment. Row b: Meridionally averaged spectra of velocity profiles at the upper level. Row c: Meridionally averaged spectra of velocity profiles at the lower level. Row d: Meridionally averaged spectra of subgrid forcing at both levels. In all spectral panels, $k_x$ represents the zonal wavenumber, and spectra are averaged over the meridional direction
  • Figure 3: A priori evaluation of CNN parameterization. Panel a: CC, RMSE, and spectrum RMSE for upper and lower levels, comparing $\mathrm{BNN}^{i,i}$, $\mathrm{BNN}^{0,i}$, and $\mathrm{TLNN}^{0,i}$ with different re-training data percentages across three target cases. Panel b: Ratio of output spectrum to FDNS spectrum for each case
  • Figure 4: A posteriori evaluation of CNN parameterization. Panels a, c, and e: Kinetic energy spectra from 10-year simulations using $\text{BNN}^{0,i}$ and $\text{TLNN}^{0,i}$ across different cases. Panels b, d, and f: PDFs of potential vorticity at the upper level for the same simulations
  • Figure 5: Cluster centers of filter spectra obtained by applying the $k$-means algorithm to the $64^2$ padded weight matrices $\left| \hat{\widetilde{W}}_\ell^{\beta,j} \right|$ from layer 2 of $\text{BNN}^{0}$ and $\text{TLNN}^{i}$. The number of cluster centers varies in each case as we increase the number of cluster centers until qualitatively similar patterns are observed
  • ...and 1 more figures