Dynamical errors in machine learning forecasts
Zhou Fang, Gianmarco Mengaldo
TL;DR
The paper tackles the problem of evaluating ML forecasts for dynamical fidelity rather than relying solely on traditional error metrics. It introduces two local dynamical indices, the instantaneous dimension $d$ and the inverse persistence $\theta$, and constructs DI-based error metrics (including $\mathrm{MSE}_d$, $\mathrm{MSE}_\theta$, and DID) along with Wasserstein-distance comparisons to quantify dynamical discrepancies. Through direct and recursive forecasts on canonical systems (Lorenz, KS, KF) and a real-world weather task, it shows that forecast errors concentrate in regions of higher $d$ and $\theta$ and that dynamical distortions accumulate with lead time and recursion, revealing failure modes not captured by MSE alone. The framework provides a data-driven, model-agnostic diagnostic tool to assess and improve the dynamical consistency of ML forecasts, with practical implications for scientific and engineering forecasting where physical fidelity is critical.
Abstract
In machine learning forecasting, standard error metrics such as mean absolute error (MAE) and mean squared error (MSE) quantify discrepancies between predictions and target values. However, these metrics do not directly evaluate the physical and/or dynamical consistency of forecasts, an increasingly critical concern in scientific and engineering applications. Indeed, a fundamental yet often overlooked question is whether machine learning forecasts preserve the dynamical behavior of the underlying system. Addressing this issue is essential for assessing the fidelity of machine learning models and identifying potential failure modes, particularly in applications where maintaining correct dynamical behavior is crucial. In this work, we investigate the relationship between standard forecasting error metrics, such as MAE and MSE, and the dynamical properties of the underlying system. To achieve this goal, we use two recently developed dynamical indices: the instantaneous dimension ($d$), and the inverse persistence ($θ$). Our results indicate that larger forecast errors -- e.g., higher MSE -- tend to occur in states with higher $d$ (higher complexity) and higher $θ$ (lower persistence). To further assess dynamical consistency, we propose error metrics based on the dynamical indices that measure the discrepancy of the forecasted $d$ and $θ$ versus their correct values. Leveraging these dynamical indices-based metrics, we analyze direct and recursive forecasting strategies for three canonical datasets -- Lorenz, Kuramoto-Sivashinsky equation, and Kolmogorov flow -- as well as a real-world weather forecasting task. Our findings reveal substantial distortions in dynamical properties in ML forecasts, especially for long forecast lead times or long recursive simulations, providing complementary information on ML forecast fidelity that can be used to improve ML models.
