Table of Contents
Fetching ...

Calibrating Geophysical Predictions under Constrained Probabilistic Distributions

Zhewen Hou, Jiajin Sun, Subashree Venkatasubramanian, Peter Jin, Shuolin Li, Tian Zheng

TL;DR

<3-5 sentence high-level summary> The paper tackles the difficulty of ensuring long-term statistical fidelity in ML surrogates for chaotic geophysical systems, where short-term forecast accuracy can diverge from the system's attractor statistics. It introduces a post hoc calibration framework based on normalization and Kernelized Stein Discrepancy (KSD) to align model outputs with a known marginal distribution, without altering training or requiring new data. The approach is analyzed theoretically and demonstrated across toy linear dynamics, sparse air-sea CO2 flux, and online quasi-geostrophic turbulence, showing improved both pointwise errors and distributional alignment, especially under marginal distribution shifts. The results suggest distribution-informed calibration as a practical, scalable tool for improving long-horizon reliability in climate and geophysical predictions.

Abstract

Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.

Calibrating Geophysical Predictions under Constrained Probabilistic Distributions

TL;DR

<3-5 sentence high-level summary> The paper tackles the difficulty of ensuring long-term statistical fidelity in ML surrogates for chaotic geophysical systems, where short-term forecast accuracy can diverge from the system's attractor statistics. It introduces a post hoc calibration framework based on normalization and Kernelized Stein Discrepancy (KSD) to align model outputs with a known marginal distribution, without altering training or requiring new data. The approach is analyzed theoretically and demonstrated across toy linear dynamics, sparse air-sea CO2 flux, and online quasi-geostrophic turbulence, showing improved both pointwise errors and distributional alignment, especially under marginal distribution shifts. The results suggest distribution-informed calibration as a practical, scalable tool for improving long-horizon reliability in climate and geophysical predictions.

Abstract

Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.

Paper Structure

This paper contains 41 sections, 2 theorems, 18 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

For any twice differentiable function $U$, the choice of $\lambda =\frac{ \sigma^2\sum_{i=1}^n\mathbb{E}\left\{\frac{\partial^2 U(\hat{y}_{1:n})}{\partial \hat{y}_i^2} \right\} }{ \sum_{i=1}^n \mathbb{E}\left\{\frac{\partial U(\hat{y}_{1:n})}{\partial \hat{y}_i} \right\}^2 }$ minimizes $\mathbb{E}\|

Figures (7)

  • Figure 1: Spectral energy diagnostics for quasi-geostrophic turbulence. Each panel shows a derived statistical quantity (computed from the model trajectories, not directly learned by the FCNN) as a function of zonal wavenumber. The high-resolution simulation (blue solid line) provides the reference spectra. The coarse low-resolution model (orange dashed line) systematically underestimates the magnitude of energy transfers and misrepresents the level of generation and friction, with pronounced differences in several of the spectra. Adding an FCNN-based parameterization (green dashed line) produces only modest changes: across all four diagnostics, the FCNN curves remain much closer to the low-resolution baseline than to the high-resolution reference, with slight improvements in some wavenumber ranges but no recovery of the high-resolution spectral amplitudes. Panel titles indicate, respectively, spectra of kinetic energy flux (KEflux), available potential energy flux (APEflux), available potential energy generation (APEgenspec), and kinetic energy frictional effects (KEfrictionspec). Together, these results illustrate that a surrogate trained solely on short-horizon prediction errors can improve pointwise fields yet still fail to reproduce the correct long-term spectral and statistical structure of the turbulent flow.
  • Figure 2: KSD calibration under steady-state distribution shift. The dynamical system is trained and evaluated under different steady-state distributions, inducing systematic bias in the raw predictions. (A) True versus raw predicted values of $Y$ on the test set: the model captures the local dependence between $\mathbf{X}_t$ and $Y$, but the point cloud deviates from the identity line, revealing systematic bias driven by the distributional shift. (B) One-dimensional marginal densities for the knowledge distribution, the true test responses, the raw predictions, and the calibrated predictions. The raw predictive distribution is shifted relative to both the knowledge and true test distributions, whereas KSD calibration transports it closer to these targets, improving distributional alignment. (C) For each independent run, we plot the MSE against the rank correlation between predictions and ground truth. Hollow markers denote raw predictions and filled markers denote calibrated predictions from the same run; red (blue) points use a training size of 100 (1000). Calibration preserves rank correlation but consistently reduces MSE, and the linear fit over all calibrated points shows that post-calibration accuracy improves approximately linearly with the model’s rank correlation.
  • Figure 3: Monthly knowledge distributions of pCO2 of the 6th province from CESM002. Each panel corresponds to a calendar month, and each horizontal ridge shows the marginal distribution of pCO2 in a given year for that month over 1982--2016, with the vertical axis indicating year. The monthly distributions exhibit strong seasonal structure: while several winter and spring months appear approximately unimodal, late-summer and autumn months (July--November) display a clearer bimodal pattern, with distinct low and high pCO2 modes. For any fixed month, the ridges drift gradually to the right over time, reflecting the long-term increase in pCO2, yet the overall shape of the distribution remains relatively stable across years. These patterns motivate the use of CESM002 as a knowledge source for KSD calibration: the model provides month-specific marginal distributions that encode robust seasonal and multimodal structure and long-term trends, which can be used to correct systematic biases in data-driven predictions trained on sparse and biased observations.
  • Figure 4: Monthly calibration of pCO2 predictions for the 6th province in 2010-2016. Each panel corresponds to one calendar month and shows four distributions of pCO2 values: the empirical distribution of the true CESM001 test data (red histogram), the distribution of raw FFN predictions (blue histogram), the distribution of KSD-calibrated predictions (orange histogram), and the month-specific knowledge distribution from CESM002 (yellow curve) obtained from a Gaussian mixture model. Although the marginal distributions differ from month to month, the CESM-based knowledge distributions and the true CESM001 marginals share similar overall structure, providing a physically informed target for calibration. In many months, the raw predictions are biased or miss important features of the true distribution—for example, they can be overly narrow, shifted relative to the true values, or fail to represent multi-modal behavior when present. After KSD calibration, the orange histograms generally move closer to the red ones and better track both the main modes and the range of observed pCO2, while remaining consistent with the yellow knowledge curves. Together with the quantitative results in Tables \ref{['tab:pco2_calibration']}, \ref{['tab:pco2_monthly_calibration_MSE']} and \ref{['tab:pco2_monthly_calibration_2WD2']}, these panels show that KSD calibration improves the match between the predictive marginals and the true pCO2 distributions across all months.
  • Figure 5: Online spectral energy diagnostics for five modeling setups in quasi-geostrophic turbulence. Each panel shows a derived statistical quantity as a function of zonal wavenumber: kinetic energy flux (KEflux), available potential energy flux (APEflux), APE generation spectrum (APEgenspec), and kinetic energy frictional dissipation spectrum (KEfrictionspec). The high-resolution simulation (blue solid line) provides the target spectra. The coarse low-resolution model (orange dashed line) and its FCNN-augmented variant (green dashed line) remain systematically biased relative to the high-resolution curves, with only modest differences between the orange and green spectra. In contrast, the KSD-calibrated low-resolution model (red solid line) substantially reduces these discrepancies: for KEflux, APEflux, and APEgenspec, the red curves lie much closer to the blue reference than either orange or green, and for KEfrictionspec they move in the correct direction relative to the uncalibrated models. The combined Low-res-FCNN-KSD model (purple solid line) achieves the closest overall match to the high-resolution spectra, nearly coinciding with the blue curves across most wavenumbers. Together with Figure \ref{['fig:pyqg']}, this figure illustrates that KSD calibration is effective at restoring physically realistic long-term energy statistics and can significantly enhance the online behavior of learned parameterizations.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Lemma 1: Stein's Lemma