On Higher Order Drift and Diffusion Estimates for Stochastic SINDy

Mathias Wanner; Igor Mezić

On Higher Order Drift and Diffusion Estimates for Stochastic SINDy

Mathias Wanner, Igor Mezić

TL;DR

This work enhances data-driven identification of stochastic dynamics by introducing higher-order drift and diffusion estimates within the SINDy framework. By leveraging Ito–Taylor expansions and linear multistep-like schemes, the authors derive second-order forward-difference and trapezoidal methods, plus general drift-drift-diffusion combinations, to tighten bias and variance at a fixed sampling rate. They quantify explicit error bounds in terms of the time-step $\Delta t$ and trajectory length $T$, and demonstrate substantial accuracy gains across canonical stochastic systems, often with reduced data requirements. The methodology remains robust to measurement noise through stochastic force inference and instrumental-variable techniques, broadening the practical applicability of stochastic SINDy for SDE identification. Overall, higher-order stochastic SINDy methods significantly improve the feasibility and reliability of learning drift and diffusion from data.

Abstract

The Sparse Identification of Nonlinear Dynamics (SINDy) algorithm can be applied to stochastic differential equations to estimate the drift and the diffusion function using data from a realization of the SDE. The SINDy algorithm requires sample data from each of these functions, which is typically estimated numerically from the data of the state. We analyze the performance of the previously proposed estimates for the drift and diffusion function to give bounds on the error for finite data. However, since this algorithm only converges as both the sampling frequency and the length of trajectory go to infinity, obtaining approximations within a certain tolerance may be infeasible. To combat this, we develop estimates with higher orders of accuracy for use in the SINDy framework. For a given sampling frequency, these estimates give more accurate approximations of the drift and diffusion functions, making SINDy a far more feasible system identification method.

On Higher Order Drift and Diffusion Estimates for Stochastic SINDy

TL;DR

and trajectory length

, and demonstrate substantial accuracy gains across canonical stochastic systems, often with reduced data requirements. The methodology remains robust to measurement noise through stochastic force inference and instrumental-variable techniques, broadening the practical applicability of stochastic SINDy for SDE identification. Overall, higher-order stochastic SINDy methods significantly improve the feasibility and reliability of learning drift and diffusion from data.

Abstract

Paper Structure (37 sections, 7 theorems, 158 equations, 8 figures, 1 table)

This paper contains 37 sections, 7 theorems, 158 equations, 8 figures, 1 table.

Introduction
Sparse Identification of Nonlinear Dynamics (SINDy)
Overview
Approximating $f(x)$
Sparse Solutions
Review of SDEs
Ergodicity
Ito-Taylor Expansion
Weak Expansion
Strong Expansions
SINDy for Stochastic Systems
Numerical Analysis of Stochastic SINDy
Drift
Diffusion
Higher Order Methods
...and 22 more sections

Key Result

Theorem 1

\newlabelth:Convergence0 Let $X_t$ be an ergodic drift-diffusion process generated by the SDE (eq:Ito). Consider the optimization problems (eq:MuLS) and (eq:SigLS) using data from a trajectory of length $T$ sampled with frequency $\Delta t$. Suppose the components of $\theta$ are linearly independ

Figures (8)

Figure 1: (Left) The mean error in the estimation of the drift coefficients for the double well system (\ref{['eq:DoubleWell']}) is plotted as a function of $\Delta t$. The error is approximated using 1,000 trajectories of length $T=20,000$. (Center, Right) The variance for each method is plotted against the sampling period, $\Delta t$, and the trajectory length, $T$. The trajectory length is fixed at $T=20,000$ for the center plot, while the sampling period was fixed at $\Delta t=0.004=4\times 10^{-3}$ for the rightmost plot.
Figure 1: (Left) The mean error in the estimation of the drift coefficients for the Van-Der-Pol system (\ref{['eq:VanDerPol']}) with measurement noise is plotted as a function of $\Delta t$. The error is approximated using 1,000 trajectories of length $T=1,000$. (Center, Right) The variance for each method is plotted against the sampling period, $\Delta t$, and the trajectory length, $T$. The trajectory length is fixed at $T=1,000$ for the center plot, while the sampling period was fixed at $\Delta t=0.008=8\times 10^{-3}$ for the rightmost plot.
Figure 2: (Left) The mean error in the estimation of the diffusion coefficients for the double well system (\ref{['eq:DoubleWell']}) is plotted as a function of $\Delta t$. The error is approximated using 1,000 trajectories of length $T=20,000$. (Center, Right) The variance for each method is plotted against the sampling period, $\Delta t$, and the trajectory length, $T$. The trajectory length is fixed at $T=20,000$ for the center plot, while the sampling period was fixed at $\Delta t=0.04=4\times 10^{-3}$ for the rightmost plot.
Figure 2: (Left) The mean error in the estimation of the diffusion coefficients for the Van-Der-Pol system (\ref{['eq:VanDerPol']}) with measurement noise is plotted as a function of $\Delta t$. The error is approximated using 1,000 trajectories of length $T=1,000$. (Center, Right) The variance for each method is plotted against the sampling period, $\Delta t$, and the trajectory length, $T$. The trajectory length is fixed at $T=1,000$ for the center plot, while the sampling period was fixed at $\Delta t=0.008=8\times 10^{-3}$ for the rightmost plot.
Figure 3: (Left) The mean error in the estimation of the drift coefficients for the Van-Der-Pol system (\ref{['eq:VanDerPol']}) is plotted as a function of $\Delta t$. The error is approximated using 1,000 trajectories of length $T=1,000$. (Center, Right) The variance for each method is plotted against the sampling period, $\Delta t$, and the trajectory length, $T$. The trajectory length is fixed at $T=1,000$ for the center plot, while the sampling period was fixed at $\Delta t=0.008=8\times 10^{-3}$ for the rightmost plot.
...and 3 more figures

Theorems & Definitions (15)

Remark 1
Theorem 1
Theorem 1
Proof 1
Theorem 2
Proof 2
Theorem 1
Remark 2
Remark 3
Theorem 2
...and 5 more

On Higher Order Drift and Diffusion Estimates for Stochastic SINDy

TL;DR

Abstract

On Higher Order Drift and Diffusion Estimates for Stochastic SINDy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (15)