Table of Contents
Fetching ...

Data-Driven Probabilistic Air-Sea Flux Parameterization

Jiarong Wu, Pavel Perezhogin, David John Gagne, Brandon Reichl, Aneesh C. Subramanian, Elizabeth Thompson, Laure Zanna

TL;DR

The paper develops a probabilistic framework for air-sea flux parameterization by modeling each flux component as a conditional Gaussian $y \sim \mathcal{N}(\mu(\mathbf{X}),\sigma^{2}(\mathbf{X}))$ and learning $\mu$ and $\sigma$ with separate neural networks trained on eddy-covariance data. The mean flux from the ANN is comparable to established bulk algorithms (e.g., COARE) while the predicted uncertainty $\sigma(\mathbf{X})$ enables stochastic sampling for ensemble simulations. Evaluations show that latent heat flux benefits most from the data-driven mean estimation, with regional scores varying by geography; the framework also reveals nontrivial structure in how $\mu_{Q_S}$ depends on inputs, including nonzero flux when $T_a-T_o=0$. Tests in GOTM demonstrate seasonally varying SST and MLD responses to flux changes, with the largest stochastic spread during spring restratification, highlighting the practical importance of incorporating flux variability into coupled models.

Abstract

Accurately quantifying air-sea fluxes is important for understanding air-sea interactions and improving coupled weather and climate systems. This study introduces a probabilistic framework to represent the highly variable nature of air-sea fluxes, which is missing in deterministic bulk algorithms. Assuming Gaussian distributions conditioned on the input variables, we use artificial neural networks and eddy-covariance measurement data to estimate the mean and variance by minimizing negative log-likelihood loss. The trained neural networks provide alternative mean flux estimates to existing bulk algorithms, and quantify the uncertainty around the mean estimates. Stochastic parameterization of air-sea turbulent fluxes can be constructed by sampling from the predicted distributions. Tests in a single-column forced upper-ocean model suggest that changes in flux algorithms influence sea surface temperature and mixed layer depth seasonally. The ensemble spread in stochastic runs is most pronounced during spring restratification.

Data-Driven Probabilistic Air-Sea Flux Parameterization

TL;DR

The paper develops a probabilistic framework for air-sea flux parameterization by modeling each flux component as a conditional Gaussian and learning and with separate neural networks trained on eddy-covariance data. The mean flux from the ANN is comparable to established bulk algorithms (e.g., COARE) while the predicted uncertainty enables stochastic sampling for ensemble simulations. Evaluations show that latent heat flux benefits most from the data-driven mean estimation, with regional scores varying by geography; the framework also reveals nontrivial structure in how depends on inputs, including nonzero flux when . Tests in GOTM demonstrate seasonally varying SST and MLD responses to flux changes, with the largest stochastic spread during spring restratification, highlighting the practical importance of incorporating flux variability into coupled models.

Abstract

Accurately quantifying air-sea fluxes is important for understanding air-sea interactions and improving coupled weather and climate systems. This study introduces a probabilistic framework to represent the highly variable nature of air-sea fluxes, which is missing in deterministic bulk algorithms. Assuming Gaussian distributions conditioned on the input variables, we use artificial neural networks and eddy-covariance measurement data to estimate the mean and variance by minimizing negative log-likelihood loss. The trained neural networks provide alternative mean flux estimates to existing bulk algorithms, and quantify the uncertainty around the mean estimates. Stochastic parameterization of air-sea turbulent fluxes can be constructed by sampling from the predicted distributions. Tests in a single-column forced upper-ocean model suggest that changes in flux algorithms influence sea surface temperature and mixed layer depth seasonally. The ensemble spread in stochastic runs is most pronounced during spring restratification.

Paper Structure

This paper contains 13 sections, 8 equations, 3 figures.

Figures (3)

  • Figure 1: (a) Ship trajectories of the various cruises in the NOAA PSL dataset. (b) An illustration of the ANN-based conditional Gaussian probabilistic model. Note that we are visualizing only two of the input space dimensions, for the purpose of showing the concept of a conditional Gaussian distribution. (c) Distributions of input variables for different subsets of data.
  • Figure 2: (a) ANN-based deterministic predictions (orange dots) and bulk algorithm predictions (black dots) plotted against measured fluxes. We are only visualizing 10% of total samples. (b) $R^2$ and bias of ANN (orange) and bulk algorithm (black), evaluated on different geographical subsets. (c) ANN prediction of the mean of $Q_S$ on a uniform input grid. The dashed gray lines mark zero temperature difference $T_a$ = $T_o$. (d) ANN prediction of the std of $Q_S$ for the same grid. (e) Scatter plot of all data points and their marginal distribution. Gaussian kernel density estimation provides the gray contour lines that indicate the available samples per unit $\Delta U$ and $\Delta (T_a-T_o)$. The same contour lines are overlaid on (c) and (d).
  • Figure 3: (a) Example THF time series for July and October 2015, computed using observed input variables at OWS Papa. Black solid line: COARE 3.6; black dashed: NCAR; orange line: ANN. Orange shade shows one standard deviation for THF predicted by ANN, which is used to generate noise-perturbed heat fluxes (examples shown by thin orange lines). (b) Difference of temperature profiles in the deterministic single-column run $\Delta T (z,t) = T_\text{ANN} (z,t) - T_\text{bulk}(z,t)$. Black and orange dashed lines show the diagnosed MLD for bulk and ANN, respectively. (c) Monthly-averaged THF discrepancies between ANN and COARE. (d) SST and (e) MLD responses in the deterministic runs between (ANN minus COARE). Trends with inconsistent signs over the years are masked with hatching. (f) Monthly-averaged SST and (g) monthly-averaged MLD spread in 20 ensemble runs.