Invariant Measures for Data-Driven Dynamical System Identification: Analysis and Application
Jonah Botvinick-Greenhouse
TL;DR
The paper addresses data-driven dynamical system identification by matching observed physical invariant measures rather than pointwise trajectory data, thus attaining robustness to noise, chaos, and slow sampling. It develops a PDE-constrained approach using stationary Fokker–Planck surrogates, coupled with gradient-based optimization, and enhances scalability via a data-adaptive, Galerkin-based PFO approximation with Monte Carlo integration. A key theoretical advance is the use of time-delay coordinates (Takens embedding) to achieve uniqueness in identification, supported by proofs linking delay-measure equality to topological conjugacy, and by practical demonstrations with multiple observables. Numerical experiments across synthetic systems and high-dimensional models (e.g., Lorenz-96 and Hall-effect thruster data) show accurate velocity recovery, scalable PFO approximations, and reliable uncertainty quantification, illustrating the practical impact for robust, data-driven dynamical modeling.
Abstract
We propose a novel approach for performing dynamical system identification, based upon the comparison of simulated and observed physical invariant measures. While standard methods adopt a Lagrangian perspective by directly treating time-trajectories as inference data, we take on an Eulerian perspective and instead seek models fitting the observed global time-invariant statistics. With this change in perspective, we gain robustness against pervasive challenges in system identification including noise, chaos, and slow sampling. In the first half of this paper, we pose the system identification task as a partial differential equation (PDE) constrained optimization problem, in which synthetic stationary solutions of the Fokker-Planck equation, obtained as fixed points of a finite-volume discretization, are compared to physical invariant measures extracted from observed trajectory data. In the latter half of the paper, we improve upon this approach in two crucial directions. First, we develop a Galerkin-inspired modification to the finite-volume surrogate model, based on data-adaptive unstructured meshes and Monte-Carlo integration, enabling the approach to efficiently scale to high-dimensional problems. Second, we leverage Takens' seminal time-delay embedding theory to introduce a critical data-dependent coordinate transformation which can guarantee unique system identifiability from the invariant measure alone. This contribution resolves a major challenge of system identification through invariant measures, as systems exhibiting distinct transient behaviors may still share the same time-invariant statistics in their state-coordinates. Throughout, we present comprehensive numerical tests which highlight the effectiveness of our approach on a variety of challenging system identification tasks.
