Table of Contents
Fetching ...

Identifying Stochastic Dynamics from Non-Sequential Data (DyNoSeD)

Zhixin Lu, Łukasz Kuśmierz, Stefan Mihalas

TL;DR

DyNoSeD offers a principled FP-residual framework for identifying stochastic dynamics from non-sequential data. It provides two complementary routes: a local score-based method suitable for region-restricted dense sampling and a global kernel Stein discrepancy method that works with broadly distributed, sparse samples, both yielding a linear system in the affine-in-parameters case. The work delivers an explicit identifiability condition and a Gram-based sensitivity analysis, plus gradient-based extensions for non-affine dynamics, demonstrated on a stochastic Lorenz system and a nonlinear gene-regulatory network. A dynamics-to-density complement shows the FP residual can also train a density estimator from known dynamics without data. Together, these results connect data, density, and stochastic dynamics in a two-route, FP-grounded system-identification framework for non-sequential measurements.

Abstract

Inferring stochastic dynamics from data is central across the sciences, yet in many applications only unordered, non-sequential measurements are available-often restricted to limited regions of state space-so standard time-series methods do not apply. We introduce DyNoSeD, a first-principles framework that identifies unknown dynamical parameters from such non-sequential data by minimizing Fokker-Planck residuals. We develop two complementary routes: a local route that handles region-restricted data via locally estimated scores, and a global route that fits dynamics from globally sampled data using a kernel Stein discrepancy without explicit density or score estimation. When the dynamics are affine in the unknown parameters, we prove a necessary-and-sufficient condition for the existence and uniqueness of the inferred parameters and derive a sensitivity analysis that identifies which parameters are tightly constrained by the data and which remain effectively free under over-parameterization. For general non-affine case, both routes define differentiable losses amenable to gradient-based optimization. As demonstrations, we recover (i) the three parameters of a stochastic Lorenz system from non-sequential data (region-restricted data for the local route and full steady-state data for the global route) and (ii) a 3x7interaction matrix of a nonlinear gene-regulatory network derived from a published B-cell differentiation model, using only unordered steady-state samples and applying the global route. Finally, we show that the same Fokker-Planck residual viewpoint supports a "dynamics-to-density" complement that trains a normalized density estimator directly from known dynamics without any observations. Overall, IDyNSD provides two first-principles routes for system-identification from non-sequential data, grounded in the Fokker-Planck equation, that link data, density, and stochastic dynamics.

Identifying Stochastic Dynamics from Non-Sequential Data (DyNoSeD)

TL;DR

DyNoSeD offers a principled FP-residual framework for identifying stochastic dynamics from non-sequential data. It provides two complementary routes: a local score-based method suitable for region-restricted dense sampling and a global kernel Stein discrepancy method that works with broadly distributed, sparse samples, both yielding a linear system in the affine-in-parameters case. The work delivers an explicit identifiability condition and a Gram-based sensitivity analysis, plus gradient-based extensions for non-affine dynamics, demonstrated on a stochastic Lorenz system and a nonlinear gene-regulatory network. A dynamics-to-density complement shows the FP residual can also train a density estimator from known dynamics without data. Together, these results connect data, density, and stochastic dynamics in a two-route, FP-grounded system-identification framework for non-sequential measurements.

Abstract

Inferring stochastic dynamics from data is central across the sciences, yet in many applications only unordered, non-sequential measurements are available-often restricted to limited regions of state space-so standard time-series methods do not apply. We introduce DyNoSeD, a first-principles framework that identifies unknown dynamical parameters from such non-sequential data by minimizing Fokker-Planck residuals. We develop two complementary routes: a local route that handles region-restricted data via locally estimated scores, and a global route that fits dynamics from globally sampled data using a kernel Stein discrepancy without explicit density or score estimation. When the dynamics are affine in the unknown parameters, we prove a necessary-and-sufficient condition for the existence and uniqueness of the inferred parameters and derive a sensitivity analysis that identifies which parameters are tightly constrained by the data and which remain effectively free under over-parameterization. For general non-affine case, both routes define differentiable losses amenable to gradient-based optimization. As demonstrations, we recover (i) the three parameters of a stochastic Lorenz system from non-sequential data (region-restricted data for the local route and full steady-state data for the global route) and (ii) a 3x7interaction matrix of a nonlinear gene-regulatory network derived from a published B-cell differentiation model, using only unordered steady-state samples and applying the global route. Finally, we show that the same Fokker-Planck residual viewpoint supports a "dynamics-to-density" complement that trains a normalized density estimator directly from known dynamics without any observations. Overall, IDyNSD provides two first-principles routes for system-identification from non-sequential data, grounded in the Fokker-Planck equation, that link data, density, and stochastic dynamics.

Paper Structure

This paper contains 25 sections, 1 theorem, 87 equations, 6 figures.

Key Result

Theorem 1

Let $\boldsymbol{A}\in\mathbb R^{M\times n}$ and $\mathbf{b}\in\mathbb R^M$ be the matrix and vector obtained from either the local score route or the global Stein route, under exact scores (local) or infinite data (global). Then there exists a parameter vector $\boldsymbol{\theta}$ whose dynamics s

Figures (6)

  • Figure 1: A framework linking non-sequential data, steady-state distributions, and stochastic dynamics via Fokker-Planck residuals (FPRs). Data$\to$Dynamics (score-based; blue): infer dynamical parameters from unordered data---even with sampling restricted to subregions---using locally estimated scores at probe points; we provide a linear identifiability condition and first-order uncertainty analysis for affine-in-parameter priors. Data$\to$Dynamics (kernel Stein discrepancy; red): infer parameters directly from broadly distributed steady-state samples without estimating densities or scores, via a kernel Stein discrepancy derived from the same FPRs; we provide a linear identifiability condition for affine-in-parameter priors. Dynamics$\to$Density (gray): as a side demonstration, we use the same FPRs to infer the steady-state density directly from known dynamics.
  • Figure 2: Ill-posedness without a constraining prior (Ornstein--Uhlenbeck example). Ground-truth drift $M_{\mathrm{true}}$ (left) and its steady density (center) admit alternative drifts with the same steady density when divergence-free probability currents are allowed. A naive norm penalty would select a flux-free diagonal $M_{\mathrm{alt}}$ that matches the density but yields incorrect dynamics. Restricting the unknowns via an informative prior (e.g., only $M_{12}$ free) restores identifiability and recovers the true flow. Vector fields are overlaid with level sets of the steady density.
  • Figure 3: Lorenz SDE: local vs. global identification from non-sequential data. Left column: steady-state samples on the Lorenz attractor for different sample sizes $N$, illustrating locally dense patches (top, middle) versus a globally sparse cloud (bottom, $N=300$). Right column: recovered parameters $(\sigma,\rho,\beta)$ (solid lines: mean; shaded bands: standard deviation; dashed lines: ground truth). Top and middle rows: local score-based route as a function of kernel temperature $\mathcal{T}$; estimates are accurate only when each local region is well populated. Bottom row: global KSD route as a function of sample size $N$; all three parameters are recovered accurately even with a few hundred globally sampled points.
  • Figure 4: Nonlinear gene–regulatory network: parameter recovery and freeness. Top row: true (left) and inferred (right) $3\times 7$ interaction matrices for a nonlinear B-cell differentiation SDE, learned from unordered steady-state samples via the global KSD route. Bottom left: steady-state clouds in the $(p,b,r)$ subspace for the true (blue) and learned (orange) dynamics, which are visually indistinguishable. Bottom right: normalized parameter freeness derived from the diagonal of the regularized Gram matrix $\boldsymbol{H}_\lambda^{-1}$; darker entries indicate directions that are less constrained by the data. The single badly recovered interaction coincides with a high-freeness (weakly constrained) entry.
  • Figure 5: Dynamics$\to$density via Fokker–Planck residual minimization. Left: schematic of a two-dimensional SDE whose drift has a stable limit cycle. Top right: training loss of the FP-residual objective (route 1) when fitting a neural score model $\mathbf{s}_\psi(\mathbf{x})$. Bottom right: learned stationary density $q_\psi$, which recovers the ring-shaped true density without using any observed data.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1: Identification in the affine-in-parameter case