Table of Contents
Fetching ...

Integrating Score-Based Generative Modeling and Neural ODEs for Accurate Representation of Multiscale Chaotic Dynamics

Giulio Del Felice, Ludovico Theo Giorgini

TL;DR

The paper tackles predictive modeling of multiscale dynamical systems with slow statistically regular behavior and fast chaotic forcing. It introduces a hybrid framework where slow dynamics are governed by a Langevin equation whose drift is constructed from a learned score function via KGMM, ensuring preservation of the invariant measure, while fast chaotic forcing is captured by a Neural ODE trained on delay-embedded residuals, providing accurate short-horizon forecasts. It extends to cyclo-stationary forcing through state-space augmentation of clock variables, enabling autonomous learning in augmented space. Validation on Lorenz-63–driven bistable and tristable systems shows that the approach achieves faithful long-term statistics and reliable early-warning forecasts of rare transitions with lead times approaching the Lyapunov time, outperforming direct slow-variable models or Gaussian-noise closures. This data-driven methodology offers a scalable path to predictive modeling of complex multiscale phenomena where both stationary statistics and transient dynamics are essential, with potential applications in climate tipping points, turbulence, and other nonlinear systems.

Abstract

Multiscale dynamical systems characterized by interacting fast and slow processes are ubiquitous across scientific domains, from climate dynamics to fluid mechanics. Accurate modeling of such systems requires capturing both the long-term statistical properties governed by slow variables and the short-term transient dynamics driven by fast chaotic processes. We present a hybrid data-driven framework that integrates score-based generative modeling with Neural Ordinary Differential Equations (NODEs) to construct reduced-order models (ROMs) capable of reproducing both regimes. The slow dynamics are represented by a Langevin equation whose drift is informed by a score function learned via the K-means Gaussian Mixture Model (KGMM) method, ensuring faithful reproduction of the system's invariant measure. The fast chaotic forcing is modeled by a NODE trained on delay-embedded residuals extracted from observed trajectories, replacing conventional Gaussian noise approximations. We validate this approach on a hierarchy of prototypical metastable systems driven by Lorenz 63 dynamics, including bistable potentials with additive and multiplicative forcing, and tristable non-autonomous systems with cycloperiodic components. Our results demonstrate that the hybrid framework maintains statistical consistency over long time horizons while accurately forecasting rare critical transitions between metastable states with lead times approaching the Lyapunov time of the chaotic driver. This work establishes a principled methodology for combining statistical closure techniques with explicit surrogate models of fast dynamics, offering a pathway toward predictive modeling of complex multiscale phenomena where both long-term statistics and short-term transients are essential.

Integrating Score-Based Generative Modeling and Neural ODEs for Accurate Representation of Multiscale Chaotic Dynamics

TL;DR

The paper tackles predictive modeling of multiscale dynamical systems with slow statistically regular behavior and fast chaotic forcing. It introduces a hybrid framework where slow dynamics are governed by a Langevin equation whose drift is constructed from a learned score function via KGMM, ensuring preservation of the invariant measure, while fast chaotic forcing is captured by a Neural ODE trained on delay-embedded residuals, providing accurate short-horizon forecasts. It extends to cyclo-stationary forcing through state-space augmentation of clock variables, enabling autonomous learning in augmented space. Validation on Lorenz-63–driven bistable and tristable systems shows that the approach achieves faithful long-term statistics and reliable early-warning forecasts of rare transitions with lead times approaching the Lyapunov time, outperforming direct slow-variable models or Gaussian-noise closures. This data-driven methodology offers a scalable path to predictive modeling of complex multiscale phenomena where both stationary statistics and transient dynamics are essential, with potential applications in climate tipping points, turbulence, and other nonlinear systems.

Abstract

Multiscale dynamical systems characterized by interacting fast and slow processes are ubiquitous across scientific domains, from climate dynamics to fluid mechanics. Accurate modeling of such systems requires capturing both the long-term statistical properties governed by slow variables and the short-term transient dynamics driven by fast chaotic processes. We present a hybrid data-driven framework that integrates score-based generative modeling with Neural Ordinary Differential Equations (NODEs) to construct reduced-order models (ROMs) capable of reproducing both regimes. The slow dynamics are represented by a Langevin equation whose drift is informed by a score function learned via the K-means Gaussian Mixture Model (KGMM) method, ensuring faithful reproduction of the system's invariant measure. The fast chaotic forcing is modeled by a NODE trained on delay-embedded residuals extracted from observed trajectories, replacing conventional Gaussian noise approximations. We validate this approach on a hierarchy of prototypical metastable systems driven by Lorenz 63 dynamics, including bistable potentials with additive and multiplicative forcing, and tristable non-autonomous systems with cycloperiodic components. Our results demonstrate that the hybrid framework maintains statistical consistency over long time horizons while accurately forecasting rare critical transitions between metastable states with lead times approaching the Lyapunov time of the chaotic driver. This work establishes a principled methodology for combining statistical closure techniques with explicit surrogate models of fast dynamics, offering a pathway toward predictive modeling of complex multiscale phenomena where both long-term statistics and short-term transients are essential.

Paper Structure

This paper contains 20 sections, 54 equations, 13 figures.

Figures (13)

  • Figure 1: Reduced-order model with Gaussian noise for system \ref{['eq:lorenz63']}. Top row: comparison between the score function estimated by the KGMM network (red) and the analytical expression $s(x)$ (green) for $\sigma_G = 0.05$, together with the stationary PDF showing three curves: the ROM reconstruction from the estimated score (red), the empirical distribution from observations of the full system (blue), and the analytical invariant measure assuming Gaussian white noise forcing (green). Bottom row: comparison between the ACF of the trajectory obtained by integrating the effective Langevin dynamics (red) and that of the observed trajectory (blue), and between trajectories generated by the effective Langevin model (red) and the observed trajectory from the original system defined by Eq. \ref{['eq:lorenz63']} (blue) over an extended time horizon.
  • Figure 2: Performance of the NODE for system \ref{['eq:lorenz63']}. Upper row: statistical comparison between the trajectory generated by the trained NODE and the ground-truth trajectory obtained by integrating the Lorenz 63 subsystem; the panels show the probability density function (left) and autocorrelation function (right). Central rows: trajectories predicted by the NODE (red) vs. ground truth (blue) from different initial conditions, with short and intermediate predictive horizons; deviations grow beyond the Lyapunov time. Bottom row: RMSE versus predictive horizon, averaged over 200 trajectories with the shaded band indicating one standard deviation across trajectories.
  • Figure 3: Top row: three representative transitions with the transition time highlighted (dashed line); Middle row: model-ensemble forecasts for three transitions and five lags (left to right); Bottom Row: ensemble RMSE as a function of the forecast horizon.
  • Figure 4: Forecasts for the same three transitions reported in Figure \ref{['fig:full_forecast_overview']} obtained using a NODE trained directly on the slow variable. The initial condition is chosen at lag $= 0.5$ before the transition.
  • Figure 5: Reduced-order model with Gaussian noise for system \ref{['eq:lorenz63_multiplicative']}. Top row: comparison between the stationary PDF reconstructed from the reduced-order model (red) and the empirical distribution of the original dataset (blue), alongside the KGMM-estimated score function and the analytic effective score for system \ref{['eq:lorenz63_multiplicative']}. Bottom row: comparison between autocorrelation functions of $x(t)$ obtained from the reduced-order model (red) and the original dataset (blue), together with trajectories from the stochastic reduced-order model (red) and the original system (blue) initialized identically.
  • ...and 8 more figures