Divergence-kernel method for linear responses of densities and generative models
Angxiu Ni
TL;DR
This work derives a divergence-kernel formula for the linear response of random dynamical systems, yielding a pathwise, density-derivative tool that handles multiplicative noise and nonhyperbolic dynamics without requiring hyperbolicity. It unifies discrete-time and continuous-time settings and enables a forward-process Monte-Carlo algorithm to estimate how marginal densities change with parameters. Building on this, the authors introduce DK-SDE, a parametric SDE generative-model framework trained via KL-divergence between data and the SDE marginal, with gradients computed through forward covectors rather than backpropagation, reducing memory cost and enabling learning with diffusion parametrization. They demonstrate accurate linear responses on 1D and 40D Lorenz-96 systems and show the viability of DK-SDE on several low-to-medium dimensional generative tasks, highlighting memory efficiency and applicability to multiplicative-noise models. The proposed framework offers a practical route for likelihood-based training of diffusion-like models while incorporating prior structure and remaining scalable to higher dimensions.
Abstract
We derive the divergence-kernel formula for the linear response of random dynamical systems. Specifically, the pathwise expression is for the parameter-derivative of the marginal or stationary density, not an averaged observable. Our formula works for multiplicative and parameterized noise over any period of time; it does not require hyperbolicity. Then we derive a Monte-Carlo algorithm for linear responses. We develop a new framework of generative models, DK-SDE, where the model is a parameterized SDE, that (1) directly uses the KL divergence between the empirical data distribution and the marginal density of the SDE as the training objective, and (2) accommodates parametrizations in both drift and diffusion over a long time span, allowing prior structural knowledge to be incorporated explicitly. The optimization is done by gradient-descent enabled by the divergence-kernel method, which involves only forward processes and therefore substantially reduces memory cost. We demonstrate the new model on a 20-dimensional Lorenz system.
