Calibrating Geophysical Predictions under Constrained Probabilistic Distributions
Zhewen Hou, Jiajin Sun, Subashree Venkatasubramanian, Peter Jin, Shuolin Li, Tian Zheng
TL;DR
<3-5 sentence high-level summary> The paper tackles the difficulty of ensuring long-term statistical fidelity in ML surrogates for chaotic geophysical systems, where short-term forecast accuracy can diverge from the system's attractor statistics. It introduces a post hoc calibration framework based on normalization and Kernelized Stein Discrepancy (KSD) to align model outputs with a known marginal distribution, without altering training or requiring new data. The approach is analyzed theoretically and demonstrated across toy linear dynamics, sparse air-sea CO2 flux, and online quasi-geostrophic turbulence, showing improved both pointwise errors and distributional alignment, especially under marginal distribution shifts. The results suggest distribution-informed calibration as a practical, scalable tool for improving long-horizon reliability in climate and geophysical predictions.
Abstract
Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.
