Learning the Infinitesimal Generator of Stochastic Diffusion Processes
Vladimir R. Kostic, Karim Lounici, Helene Halconruy, Timothee Devergne, Massimiliano Pontil
TL;DR
The paper tackles the problem of data-driven learning of the infinitesimal generator $L$ for stochastic diffusion processes, addressing the unboundedness of $L$ by reframing learning around the resolvent $(\mu I - L)^{-1}$ within an energy space. It introduces an energy-based risk in RKHS, leverages the embedding $Z_\\mu$ into the energy space $\\mathcal{W}^{\\mu}_\\pi(\\mathcal{X})$, and derives two estimators, KRR and RRR, to learn a finite-rank approximation of the resolvent. Theoretical contributions include the first spectral learning bounds for generator learning, decomposition of errors into regularization, rank-reduction, and variance terms with rates depending on regularity and embedding properties, and minimax-like guarantees under suitable conditions. Empirically, the approach yields accurate eigenpairs for diffusion generators, avoids spurious eigenvalues, and outperforms transfer-operator-based methods on challenging examples such as Langevin dynamics and CIR, while highlighting scalability challenges and directions for broader SDEs.
Abstract
We address data-driven learning of the infinitesimal generator of stochastic diffusion processes, essential for understanding numerical simulations of natural and physical systems. The unbounded nature of the generator poses significant challenges, rendering conventional analysis techniques for Hilbert-Schmidt operators ineffective. To overcome this, we introduce a novel framework based on the energy functional for these stochastic processes. Our approach integrates physical priors through an energy-based risk metric in both full and partial knowledge settings. We evaluate the statistical performance of a reduced-rank estimator in reproducing kernel Hilbert spaces (RKHS) in the partial knowledge setting. Notably, our approach provides learning bounds independent of the state space dimension and ensures non-spurious spectral estimation. Additionally, we elucidate how the distortion between the intrinsic energy-induced metric of the stochastic diffusion and the RKHS metric used for generator estimation impacts the spectral learning bounds.
