Nonparametric estimation of homogenized invariant measures from multiscale data via Hermite expansion
Jaroslav I. Borodavka, Max Hirsch, Sebastian Krumscheid, Andrea Zanoni
TL;DR
This work develops a nonparametric, spectrally based estimator for the invariant density $\rho$ of the homogenized one-dimensional Langevin diffusion, using a truncated Fourier expansion in Hermite functions and time-average coefficients from a single multiscale trajectory. A rigorous convergence analysis combines a Gaussian-mixture extension for general potentials with an $\varepsilon$-dependent mean ergodic theorem to control stochastic error, showing $\mathbb{E}\|\widehat{\rho}_N^{T,\varepsilon}-\rho\|_{L^2}^2 \to 0$ as $\varepsilon\to0$ under prescribed growth of the number of modes $N(\varepsilon)$ and observation time $T(\varepsilon)$. Numerically, the estimator demonstrates robustness to model misspecification, accurately recovers the homogenized invariant density, and enables practical wavelength (i.e., scale) inference from spectral data; it also extends naturally to two dimensions via tensor-product Hermite bases. The results advance nonparametric homogenization by enabling invariant-density estimation from multiscale data without explicit knowledge of the homogenized model and point to future work in CLTs, higher-dimensional extensions, and drift/potential estimation.
Abstract
We consider the problem of density estimation in the context of multiscale Langevin diffusion processes, where a single-scale homogenized surrogate model can be derived. In particular, our aim is to learn the density of the invariant measure of the homogenized dynamics from a continuous-time trajectory generated by the full multiscale system. We propose a spectral method based on a truncated Fourier expansion with Hermite functions as orthonormal basis. The Fourier coefficients are computed directly from the data owing to the ergodic theorem. We prove that the resulting density estimator is robust and converges to the invariant density of the homogenized model as the scale separation parameter vanishes, provided the time horizon and the number of Fourier modes are suitably chosen in relation to the multiscale parameter. The accuracy and reliability of this methodology is further demonstrated through a series of numerical experiments.
