Identifying Stochastic Dynamics from Non-Sequential Data (DyNoSeD)
Zhixin Lu, Łukasz Kuśmierz, Stefan Mihalas
TL;DR
DyNoSeD offers a principled FP-residual framework for identifying stochastic dynamics from non-sequential data. It provides two complementary routes: a local score-based method suitable for region-restricted dense sampling and a global kernel Stein discrepancy method that works with broadly distributed, sparse samples, both yielding a linear system in the affine-in-parameters case. The work delivers an explicit identifiability condition and a Gram-based sensitivity analysis, plus gradient-based extensions for non-affine dynamics, demonstrated on a stochastic Lorenz system and a nonlinear gene-regulatory network. A dynamics-to-density complement shows the FP residual can also train a density estimator from known dynamics without data. Together, these results connect data, density, and stochastic dynamics in a two-route, FP-grounded system-identification framework for non-sequential measurements.
Abstract
Inferring stochastic dynamics from data is central across the sciences, yet in many applications only unordered, non-sequential measurements are available-often restricted to limited regions of state space-so standard time-series methods do not apply. We introduce DyNoSeD, a first-principles framework that identifies unknown dynamical parameters from such non-sequential data by minimizing Fokker-Planck residuals. We develop two complementary routes: a local route that handles region-restricted data via locally estimated scores, and a global route that fits dynamics from globally sampled data using a kernel Stein discrepancy without explicit density or score estimation. When the dynamics are affine in the unknown parameters, we prove a necessary-and-sufficient condition for the existence and uniqueness of the inferred parameters and derive a sensitivity analysis that identifies which parameters are tightly constrained by the data and which remain effectively free under over-parameterization. For general non-affine case, both routes define differentiable losses amenable to gradient-based optimization. As demonstrations, we recover (i) the three parameters of a stochastic Lorenz system from non-sequential data (region-restricted data for the local route and full steady-state data for the global route) and (ii) a 3x7interaction matrix of a nonlinear gene-regulatory network derived from a published B-cell differentiation model, using only unordered steady-state samples and applying the global route. Finally, we show that the same Fokker-Planck residual viewpoint supports a "dynamics-to-density" complement that trains a normalized density estimator directly from known dynamics without any observations. Overall, IDyNSD provides two first-principles routes for system-identification from non-sequential data, grounded in the Fokker-Planck equation, that link data, density, and stochastic dynamics.
