Noise-Aware System Identification for High-Dimensional Stochastic Dynamics
Ziheng Guo, Igor Cialenco, Ming Zhong
TL;DR
The paper tackles learning high-dimensional stochastic dynamics by jointly identifying the drift $\mathbf{f}$ and state-dependent diffusion $\Sigma(\mathbf{x})$ from trajectory data. It introduces a noise-aware, two-stage approach: first estimate diffusion via quadratic variation, then recover the drift through a likelihood-based loss derived from Girsanov, with a convergence theorem ensuring consistency and asymptotic normality. The diffusion is constrained to be SPD using a Cholesky parameterization, enabling scalable deep-learning representations for $\mathbf{f}$ and $\Sigma$; a discretized, trajectory-based loss accommodates fragmented data. Validation on interacting particle systems and stochastic partial differential equations demonstrates accurate reconstruction of both drift and noise, including colored and multiplicative noise, and provides practical metrics like $L^2_\rho$ error and Wasserstein distances. Overall, the framework advances data-driven modeling of complex stochastic environments by jointly inferring dynamics and noise structure from observed trajectories with theoretical guarantees and scalable implementations.
Abstract
Stochastic dynamical systems are ubiquitous in physics, biology, and engineering, where both deterministic drifts and random fluctuations govern system behavior. Learning these dynamics from data is particularly challenging in high-dimensional settings with complex, correlated, or state-dependent noise. We introduce a noise-aware system identification framework that jointly recovers the deterministic drift and full noise structure directly from the trajectory data, without requiring prior assumptions on the noise model. Our method accommodates a broad class of stochastic dynamics, including colored and multiplicative noise, that scales efficiently to high-dimensional systems, and accurately reconstructs the underlying dynamics. Numerical experiments on diverse systems validate the approach and highlight its potential for data-driven modeling in complex stochastic environments.
