Weak Collocation Regression method: fast reveal hidden stochastic dynamics from high-dimensional aggregate data
Liwei Lu, Zhijun Zeng, Yan Jiang, Yi Zhu, Pipi Hu
TL;DR
This work tackles inferring hidden stochastic dynamics from aggregate data lacking individual trajectories by introducing Weak Collocation Regression (WCR), which blends the weak form of the Fokker-Planck equation with Gaussian kernel collocation and sparse regression. By transferring spatial derivatives to Gaussian test functions, WCR converts density evolution into a set of data-driven, one-dimensional temporal relations that are solved with linear multistep discretization and a dictionary-based regression for drift $\boldsymbol{\mu}$ and diffusion $D$. The method achieves fast, scalable learning across 1D to 20D problems, including variable diffusion and coupled drift, with strong robustness to missing data and noise and favorable computational profiles (seconds to minutes). These capabilities enable accurate recovery of stochastic dynamics from high-dimensional aggregate data, offering a practical tool for scientific discovery in settings where trajectories are unavailable. The results highlight WCR’s potential for real-time or large-scale analyses and point to avenues for enhancement via active sampling, neural-test-function extensions, and informed basis selection.
Abstract
Revealing hidden dynamics from the stochastic data is a challenging problem as randomness takes part in the evolution of the data. The problem becomes exceedingly complex when the trajectories of the stochastic data are absent in many scenarios. Here we present an approach to effectively modeling the dynamics of the stochastic data without trajectories based on the weak form of the Fokker-Planck (FP) equation, which governs the evolution of the density function in the Brownian process. Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data. With a dictionary representation of the unknown terms, a linear system is built and then solved by the regression, revealing the unknown dynamics of the data. Hence, we name the method with the Weak Collocation Regression (WCR) method for its three key components: weak form, collocation of Gaussian kernels, and regression. The numerical experiments show that our method is flexible and fast, which reveals the dynamics within seconds in multi-dimensional problems and can be easily extended to high-dimensional data such as 20 dimensions. WCR can also correctly identify the hidden dynamics of the complex tasks with variable-dependent diffusion and coupled drift, and the performance is robust, achieving high accuracy in the case with noise added.
