Table of Contents
Fetching ...

A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series

Xuyang Li, John Harlim, Dibyajyoti Chakraborty, Romit Maulik

TL;DR

This work addresses the difficulty of learning deterministic chaotic dynamics from noisy observations by introducing WP-NODE, a weak-penalty augmentation to Neural ODEs (NODEs). WP-NODE blends a weak-form residual loss over temporal subdomains with a standard strong-form trajectory loss, enabling robust learning that preserves long-term invariant statistics while maintaining accurate short-term predictions. Across chaotic systems (Lorenz-63, Lorenz-96, Kuramoto–Sivashinsky) and a real-world ERA5 climate dataset, WP-NODE achieves superior predictive horizons and faithful invariant measures, with solver-agnostic stability and considerable training efficiency (notably an ~8.6× per-epoch speedup over full strong-form NODEs). The approach offers a practical pathway for robust data-driven modeling of complex dynamical systems, including climate applications, under realistic noise conditions.

Abstract

Accurate forecasting of complex high-dimensional dynamical systems from observational data is essential for several applications across science and engineering. A key challenge, however, is that real-world measurements are often corrupted by noise, which severely degrades the performance of data-driven models. Particularly, in chaotic dynamical systems, where small errors amplify rapidly, it is challenging to identify a data-driven model from noisy data that achieves short-term accuracy while preserving long-term invariant properties. In this paper, we propose the use of the weak formulation as a complementary approach to the classical strong formulation of data-driven time-series forecasting models. Specifically, we focus on the neural ordinary differential equation (NODE) architecture. Unlike the standard strong formulation, which relies on the discretization of the NODE followed by optimization, the weak formulation constrains the model using a set of integrated residuals over temporal subdomains. While such a formulation yields an effective NODE model, we discover that the performance of a NODE can be further enhanced by employing this weak formulation as a penalty alongside the classical strong formulation-based learning. Through numerical demonstrations, we illustrate that our proposed training strategy, which we coined as the Weak-Penalty NODE (WP-NODE), achieves state-of-the-art forecasting accuracy and exceptional robustness across benchmark chaotic dynamical systems and real-world climate dataset.

A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series

TL;DR

This work addresses the difficulty of learning deterministic chaotic dynamics from noisy observations by introducing WP-NODE, a weak-penalty augmentation to Neural ODEs (NODEs). WP-NODE blends a weak-form residual loss over temporal subdomains with a standard strong-form trajectory loss, enabling robust learning that preserves long-term invariant statistics while maintaining accurate short-term predictions. Across chaotic systems (Lorenz-63, Lorenz-96, Kuramoto–Sivashinsky) and a real-world ERA5 climate dataset, WP-NODE achieves superior predictive horizons and faithful invariant measures, with solver-agnostic stability and considerable training efficiency (notably an ~8.6× per-epoch speedup over full strong-form NODEs). The approach offers a practical pathway for robust data-driven modeling of complex dynamical systems, including climate applications, under realistic noise conditions.

Abstract

Accurate forecasting of complex high-dimensional dynamical systems from observational data is essential for several applications across science and engineering. A key challenge, however, is that real-world measurements are often corrupted by noise, which severely degrades the performance of data-driven models. Particularly, in chaotic dynamical systems, where small errors amplify rapidly, it is challenging to identify a data-driven model from noisy data that achieves short-term accuracy while preserving long-term invariant properties. In this paper, we propose the use of the weak formulation as a complementary approach to the classical strong formulation of data-driven time-series forecasting models. Specifically, we focus on the neural ordinary differential equation (NODE) architecture. Unlike the standard strong formulation, which relies on the discretization of the NODE followed by optimization, the weak formulation constrains the model using a set of integrated residuals over temporal subdomains. While such a formulation yields an effective NODE model, we discover that the performance of a NODE can be further enhanced by employing this weak formulation as a penalty alongside the classical strong formulation-based learning. Through numerical demonstrations, we illustrate that our proposed training strategy, which we coined as the Weak-Penalty NODE (WP-NODE), achieves state-of-the-art forecasting accuracy and exceptional robustness across benchmark chaotic dynamical systems and real-world climate dataset.

Paper Structure

This paper contains 10 sections, 29 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Forecasting of the Lorenz-63 system using WP-NODE under 5% training data noise. a. Time-series comparison between predicted and ground-truth trajectories for all three state variables. Rows show the best (VPT = 5.77) and worst (VPT = 0.63) cases (x-axis in Lyapunov time). Despite low VPT in the second case, predictions remain accurate for about 4 Lyapunov times. b. Reconstructed phase-space attractors for the predicted (orange) and true attractor (blue).
  • Figure 1: Invariant measure comparisons of the learned Lorenz–63 system across different methods under varying noise conditions.
  • Figure 2: Invariant measure of the predicted Lorenz-63 system (for $100s$) for the X components beyond the training region, across different data noise. The proposed method maintains strong agreement with the ground truth across all noise levels, preserving long-term statistical properties. Other methods deteriorate significantly under moderate to high noise.
  • Figure 2: Comprehensive ablation study of the learned L63 system. a. Number of hidden layers, each layer has 200 neurons. b. The integration domain size $M$. c. The training signal length in seconds. d. The batch size during training. e. The number of subdomains $K$. f. The polynomial order $p$ of the test function. g. The number of rollouts used in WP-NODE for pointwise residual minimization. h. The regularization coefficient $\lambda$ balances the loss from weak and strong formulations.
  • Figure 3: Performance comparison of WP-NODE and baselines models on the KS system, under 5% data noise. DeepSkip is not included since the original work was analyzed on a different configuration and would require task-specific fine-tuning. a. Short-time prediction comparison of the learned KS system under 5% observation noise. b. Joint probability density for the KS system under 5% observation noise. Both the WP-NODE and strong NODE closely reproduce the reference distribution, with WP-NODE showing slightly improved alignment in the core region. The weak NODE, however, exhibits notable distortion in the invariant measure.
  • ...and 7 more figures