Table of Contents
Fetching ...

Weak Collocation Regression method: fast reveal hidden stochastic dynamics from high-dimensional aggregate data

Liwei Lu, Zhijun Zeng, Yan Jiang, Yi Zhu, Pipi Hu

TL;DR

This work tackles inferring hidden stochastic dynamics from aggregate data lacking individual trajectories by introducing Weak Collocation Regression (WCR), which blends the weak form of the Fokker-Planck equation with Gaussian kernel collocation and sparse regression. By transferring spatial derivatives to Gaussian test functions, WCR converts density evolution into a set of data-driven, one-dimensional temporal relations that are solved with linear multistep discretization and a dictionary-based regression for drift $\boldsymbol{\mu}$ and diffusion $D$. The method achieves fast, scalable learning across 1D to 20D problems, including variable diffusion and coupled drift, with strong robustness to missing data and noise and favorable computational profiles (seconds to minutes). These capabilities enable accurate recovery of stochastic dynamics from high-dimensional aggregate data, offering a practical tool for scientific discovery in settings where trajectories are unavailable. The results highlight WCR’s potential for real-time or large-scale analyses and point to avenues for enhancement via active sampling, neural-test-function extensions, and informed basis selection.

Abstract

Revealing hidden dynamics from the stochastic data is a challenging problem as randomness takes part in the evolution of the data. The problem becomes exceedingly complex when the trajectories of the stochastic data are absent in many scenarios. Here we present an approach to effectively modeling the dynamics of the stochastic data without trajectories based on the weak form of the Fokker-Planck (FP) equation, which governs the evolution of the density function in the Brownian process. Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data. With a dictionary representation of the unknown terms, a linear system is built and then solved by the regression, revealing the unknown dynamics of the data. Hence, we name the method with the Weak Collocation Regression (WCR) method for its three key components: weak form, collocation of Gaussian kernels, and regression. The numerical experiments show that our method is flexible and fast, which reveals the dynamics within seconds in multi-dimensional problems and can be easily extended to high-dimensional data such as 20 dimensions. WCR can also correctly identify the hidden dynamics of the complex tasks with variable-dependent diffusion and coupled drift, and the performance is robust, achieving high accuracy in the case with noise added.

Weak Collocation Regression method: fast reveal hidden stochastic dynamics from high-dimensional aggregate data

TL;DR

This work tackles inferring hidden stochastic dynamics from aggregate data lacking individual trajectories by introducing Weak Collocation Regression (WCR), which blends the weak form of the Fokker-Planck equation with Gaussian kernel collocation and sparse regression. By transferring spatial derivatives to Gaussian test functions, WCR converts density evolution into a set of data-driven, one-dimensional temporal relations that are solved with linear multistep discretization and a dictionary-based regression for drift and diffusion . The method achieves fast, scalable learning across 1D to 20D problems, including variable diffusion and coupled drift, with strong robustness to missing data and noise and favorable computational profiles (seconds to minutes). These capabilities enable accurate recovery of stochastic dynamics from high-dimensional aggregate data, offering a practical tool for scientific discovery in settings where trajectories are unavailable. The results highlight WCR’s potential for real-time or large-scale analyses and point to avenues for enhancement via active sampling, neural-test-function extensions, and informed basis selection.

Abstract

Revealing hidden dynamics from the stochastic data is a challenging problem as randomness takes part in the evolution of the data. The problem becomes exceedingly complex when the trajectories of the stochastic data are absent in many scenarios. Here we present an approach to effectively modeling the dynamics of the stochastic data without trajectories based on the weak form of the Fokker-Planck (FP) equation, which governs the evolution of the density function in the Brownian process. Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data. With a dictionary representation of the unknown terms, a linear system is built and then solved by the regression, revealing the unknown dynamics of the data. Hence, we name the method with the Weak Collocation Regression (WCR) method for its three key components: weak form, collocation of Gaussian kernels, and regression. The numerical experiments show that our method is flexible and fast, which reveals the dynamics within seconds in multi-dimensional problems and can be easily extended to high-dimensional data such as 20 dimensions. WCR can also correctly identify the hidden dynamics of the complex tasks with variable-dependent diffusion and coupled drift, and the performance is robust, achieving high accuracy in the case with noise added.
Paper Structure (17 sections, 3 theorems, 76 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 17 sections, 3 theorems, 76 equations, 7 figures, 12 tables, 2 algorithms.

Key Result

Lemma 1

Suppose $\{X_t\}$ solves the SDEs eq.sde, then the probability density function $p(x,t)$ of the random variable $X_t$ satisfies the following d-dimensional Fokker-Planck equation by the Itô integral where $\boldsymbol{x}\in\mathbb{R}^d$, $t\in[0,T]\subset\mathbb{R}$, $p = p(\boldsymbol{x},t)\in\mathbb{R}$ is the probability density function with $\int_{\mathbb{R}^d} p(\boldsymbol{x},t)dx=1$, $\bo

Figures (7)

  • Figure 1: The diagram of the weak collocation regression method. The aggregate data set $\mathbb{X}$ on panel (b) is the collection of $L$ snapshots of samples at time $t_1, t_2, \ldots, t_L$ from one unknown stochastic process. We model this process by the stochastic differential equations in panel (a) with unknown drift $\bf{\mu}(X_t, t)$ and diffusion $\bf{\sigma}(X_t,t)$ terms. By sampling Gaussian kernels in panel (c), for each kernel, the weak form in panel (d) gives the algebraic relation of the unknown terms and the data set. By the LMMs and the basis expansion of the unknown terms, a linear system is built and further combined together to form a large system over all of the collocation kernels. Finally, the sparse linear regression gives the sparse regression of the drift and diffusion terms and hence the hidden dynamics is revealed.
  • Figure 2: The results of 1d cubic polynomial problem compared with Chen's work (sota). Reveal the unknown drift and diffusion terms with 10000 samples of $X_t$ at different time snapshots: (a) Observations at $t = 0.1, 0.3, 0.5$; (b) Observations at $t=0.2, 0.5, 1$ where the samples are generated by the given SDE with drift term $\mu\_\text{true} = x-x^3$ and diffusion term $\sigma=1$. The inference results are denoted by $\mu\_{\text{true}}$, $\mu\_{\text{chen}}$ (sota) and $\mu\_{\text{wcr}}$ (ours).
  • Figure 3: Functional graphs of the drift terms $-2xe^{-x^2}$ revealing by WCR method under different orders of polynomial basis from 3 to 9.
  • Figure 4: The learned drift terms of the 3d and 4d problems. Reveal the unkown drift and diffusion terms with $100, 000$ samples each snapshot at time $t = 0.1, 0.3, 0.5, 0.7, 1$ of (a) 3-dimensional problem; (b) 4-dimensional problem, where the samples are generated by the given SDE with drift term $\mu\_\text{true} = x-x^3$ and diffusion term $\sigma=1$ in per dimension. The inference results are denoted by $\mu\_\text{true}$ and $\mu_i\_\text{wcr}$ for the $i$-th dimension.
  • Figure 5: The results of 10d problems with different number of samples per time snapshot and gaussian kernels. Reveal the unkown drift and diffusion terms with (a) 10000 samples of $X_t$ using 1000 gaussian kernels; (b) 100000 samples of $X_t$ using 1000 gaussian kernels; (c) 10000 samples of $X_t$ using 10000 gaussian kernels; (d) 100000 samples of $X_t$ using 10000 gaussian kernels at time snapshots $t = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1$ of 10-dimensional problem, where the samples are generated by the given SDE with drift term $\mu\_\text{true} = x-x^3$ and diffusion term $\sigma=1$ in per dimension. The inference results are denoted by $\mu_i$ for the learned drift in the $i$-th dimension.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Lemma 1
  • Example 1
  • Theorem 1
  • Remark 1
  • Remark 2
  • Theorem 1
  • proof