Table of Contents
Fetching ...

Dynamical errors in machine learning forecasts

Zhou Fang, Gianmarco Mengaldo

TL;DR

The paper tackles the problem of evaluating ML forecasts for dynamical fidelity rather than relying solely on traditional error metrics. It introduces two local dynamical indices, the instantaneous dimension $d$ and the inverse persistence $\theta$, and constructs DI-based error metrics (including $\mathrm{MSE}_d$, $\mathrm{MSE}_\theta$, and DID) along with Wasserstein-distance comparisons to quantify dynamical discrepancies. Through direct and recursive forecasts on canonical systems (Lorenz, KS, KF) and a real-world weather task, it shows that forecast errors concentrate in regions of higher $d$ and $\theta$ and that dynamical distortions accumulate with lead time and recursion, revealing failure modes not captured by MSE alone. The framework provides a data-driven, model-agnostic diagnostic tool to assess and improve the dynamical consistency of ML forecasts, with practical implications for scientific and engineering forecasting where physical fidelity is critical.

Abstract

In machine learning forecasting, standard error metrics such as mean absolute error (MAE) and mean squared error (MSE) quantify discrepancies between predictions and target values. However, these metrics do not directly evaluate the physical and/or dynamical consistency of forecasts, an increasingly critical concern in scientific and engineering applications. Indeed, a fundamental yet often overlooked question is whether machine learning forecasts preserve the dynamical behavior of the underlying system. Addressing this issue is essential for assessing the fidelity of machine learning models and identifying potential failure modes, particularly in applications where maintaining correct dynamical behavior is crucial. In this work, we investigate the relationship between standard forecasting error metrics, such as MAE and MSE, and the dynamical properties of the underlying system. To achieve this goal, we use two recently developed dynamical indices: the instantaneous dimension ($d$), and the inverse persistence ($θ$). Our results indicate that larger forecast errors -- e.g., higher MSE -- tend to occur in states with higher $d$ (higher complexity) and higher $θ$ (lower persistence). To further assess dynamical consistency, we propose error metrics based on the dynamical indices that measure the discrepancy of the forecasted $d$ and $θ$ versus their correct values. Leveraging these dynamical indices-based metrics, we analyze direct and recursive forecasting strategies for three canonical datasets -- Lorenz, Kuramoto-Sivashinsky equation, and Kolmogorov flow -- as well as a real-world weather forecasting task. Our findings reveal substantial distortions in dynamical properties in ML forecasts, especially for long forecast lead times or long recursive simulations, providing complementary information on ML forecast fidelity that can be used to improve ML models.

Dynamical errors in machine learning forecasts

TL;DR

The paper tackles the problem of evaluating ML forecasts for dynamical fidelity rather than relying solely on traditional error metrics. It introduces two local dynamical indices, the instantaneous dimension and the inverse persistence , and constructs DI-based error metrics (including , , and DID) along with Wasserstein-distance comparisons to quantify dynamical discrepancies. Through direct and recursive forecasts on canonical systems (Lorenz, KS, KF) and a real-world weather task, it shows that forecast errors concentrate in regions of higher and and that dynamical distortions accumulate with lead time and recursion, revealing failure modes not captured by MSE alone. The framework provides a data-driven, model-agnostic diagnostic tool to assess and improve the dynamical consistency of ML forecasts, with practical implications for scientific and engineering forecasting where physical fidelity is critical.

Abstract

In machine learning forecasting, standard error metrics such as mean absolute error (MAE) and mean squared error (MSE) quantify discrepancies between predictions and target values. However, these metrics do not directly evaluate the physical and/or dynamical consistency of forecasts, an increasingly critical concern in scientific and engineering applications. Indeed, a fundamental yet often overlooked question is whether machine learning forecasts preserve the dynamical behavior of the underlying system. Addressing this issue is essential for assessing the fidelity of machine learning models and identifying potential failure modes, particularly in applications where maintaining correct dynamical behavior is crucial. In this work, we investigate the relationship between standard forecasting error metrics, such as MAE and MSE, and the dynamical properties of the underlying system. To achieve this goal, we use two recently developed dynamical indices: the instantaneous dimension (), and the inverse persistence (). Our results indicate that larger forecast errors -- e.g., higher MSE -- tend to occur in states with higher (higher complexity) and higher (lower persistence). To further assess dynamical consistency, we propose error metrics based on the dynamical indices that measure the discrepancy of the forecasted and versus their correct values. Leveraging these dynamical indices-based metrics, we analyze direct and recursive forecasting strategies for three canonical datasets -- Lorenz, Kuramoto-Sivashinsky equation, and Kolmogorov flow -- as well as a real-world weather forecasting task. Our findings reveal substantial distortions in dynamical properties in ML forecasts, especially for long forecast lead times or long recursive simulations, providing complementary information on ML forecast fidelity that can be used to improve ML models.

Paper Structure

This paper contains 26 sections, 18 equations, 50 figures, 2 tables.

Figures (50)

  • Figure 1: Overview of datasets. Panel (a): Ground truth solution for each dataset, used as 'true data' for ML learning. Panel (b): Dynamical space of true data, where each point represents a data snapshot. The coordinates $d$ and $\theta$ are dynamical indices that describe the dynamical properties of each state. The mean values of the indices are highlighted with red circle and text. Panel (c): ML forecast solution, accompanied by standard forecast errors, namely MSE (and RMSE for the weather dataset). Panel (d): Dynamical space of ML forecasts. Each forecast state is plotted at corresponding $d$ and $\theta$, colored by the forecast error. The average dynamical indices of the forecasts are marked with red triangle and text.
  • Figure 1: Forecast error as a function of lead time. Mean squared error (MSE) is shown for varying forecast lead times. Error bars denote the standard deviation computed from three independent runs, each initialized with different random model parameters.
  • Figure 2: Relationship between forecast error and dynamical indices (1-step time lead; direct forecasts). Each panel represents one dataset (where $m$ is the input length used). The left/middle columns show mean MSE (and RMSE for the weather datasset) vs. quantiles of $d$ (left) and $\theta$ (middle), with forecasts grouped into 10 bins. The right column shows the $d-\theta$ space of each forecast colored by MSE (and RMSE for the weather dataset) forecast error, alongside average true/predicted indices. At the top of each plot in the right column, we report the Wasserstein Distance (WD), that measures differences in these $(d,\theta)$ distributions; smaller WD indicates a closer match.
  • Figure 2: Forecast errors and dynamical space for recursive runs on the KS dataset. Panel (a) Mean squared error (MSE) as a function of Lyapunov Time (LT). Shaded regions represent the standard deviation computed across forecasts initialized from 2000 distinct initial states. Panel (b) Distribution of trajectories in the $d$–$\theta$ dynamical space at forecast times 0.1 LT, 1.0 LT, 2.0 LT, and 3.0 LT. Axes represent dynamical indices $d$ (horizontal) and $\theta$ (vertical), with consistent ranges across all subplots, as shown in the top-left panel. Mean values of $d$ and $\theta$ indices and Wasserstein Distance (WD) between predicted and true distributions are annotated within each subplot.
  • Figure 3: DID of Lorenz, KS and KF, and weather dataset. In each subplot, the $x$-axis represents $\mathrm{DID}_d$ and the $y$-axis represents $\mathrm{DID}_\theta$. The percentage of points falling into each quadrant is displayed at the corresponding corner. Points are colored according to their MSE
  • ...and 45 more figures