Table of Contents
Fetching ...

Divergence-kernel method for linear responses of densities and generative models

Angxiu Ni

TL;DR

This work derives a divergence-kernel formula for the linear response of random dynamical systems, yielding a pathwise, density-derivative tool that handles multiplicative noise and nonhyperbolic dynamics without requiring hyperbolicity. It unifies discrete-time and continuous-time settings and enables a forward-process Monte-Carlo algorithm to estimate how marginal densities change with parameters. Building on this, the authors introduce DK-SDE, a parametric SDE generative-model framework trained via KL-divergence between data and the SDE marginal, with gradients computed through forward covectors rather than backpropagation, reducing memory cost and enabling learning with diffusion parametrization. They demonstrate accurate linear responses on 1D and 40D Lorenz-96 systems and show the viability of DK-SDE on several low-to-medium dimensional generative tasks, highlighting memory efficiency and applicability to multiplicative-noise models. The proposed framework offers a practical route for likelihood-based training of diffusion-like models while incorporating prior structure and remaining scalable to higher dimensions.

Abstract

We derive the divergence-kernel formula for the linear response of random dynamical systems. Specifically, the pathwise expression is for the parameter-derivative of the marginal or stationary density, not an averaged observable. Our formula works for multiplicative and parameterized noise over any period of time; it does not require hyperbolicity. Then we derive a Monte-Carlo algorithm for linear responses. We develop a new framework of generative models, DK-SDE, where the model is a parameterized SDE, that (1) directly uses the KL divergence between the empirical data distribution and the marginal density of the SDE as the training objective, and (2) accommodates parametrizations in both drift and diffusion over a long time span, allowing prior structural knowledge to be incorporated explicitly. The optimization is done by gradient-descent enabled by the divergence-kernel method, which involves only forward processes and therefore substantially reduces memory cost. We demonstrate the new model on a 20-dimensional Lorenz system.

Divergence-kernel method for linear responses of densities and generative models

TL;DR

This work derives a divergence-kernel formula for the linear response of random dynamical systems, yielding a pathwise, density-derivative tool that handles multiplicative noise and nonhyperbolic dynamics without requiring hyperbolicity. It unifies discrete-time and continuous-time settings and enables a forward-process Monte-Carlo algorithm to estimate how marginal densities change with parameters. Building on this, the authors introduce DK-SDE, a parametric SDE generative-model framework trained via KL-divergence between data and the SDE marginal, with gradients computed through forward covectors rather than backpropagation, reducing memory cost and enabling learning with diffusion parametrization. They demonstrate accurate linear responses on 1D and 40D Lorenz-96 systems and show the viability of DK-SDE on several low-to-medium dimensional generative tasks, highlighting memory efficiency and applicability to multiplicative-noise models. The proposed framework offers a practical route for likelihood-based training of diffusion-like models while incorporating prior structure and remaining scalable to higher dimensions.

Abstract

We derive the divergence-kernel formula for the linear response of random dynamical systems. Specifically, the pathwise expression is for the parameter-derivative of the marginal or stationary density, not an averaged observable. Our formula works for multiplicative and parameterized noise over any period of time; it does not require hyperbolicity. Then we derive a Monte-Carlo algorithm for linear responses. We develop a new framework of generative models, DK-SDE, where the model is a parameterized SDE, that (1) directly uses the KL divergence between the empirical data distribution and the marginal density of the SDE as the training objective, and (2) accommodates parametrizations in both drift and diffusion over a long time span, allowing prior structural knowledge to be incorporated explicitly. The optimization is done by gradient-descent enabled by the divergence-kernel method, which involves only forward processes and therefore substantially reduces memory cost. We demonstrate the new model on a 20-dimensional Lorenz system.

Paper Structure

This paper contains 26 sections, 6 theorems, 75 equations, 6 figures.

Key Result

lemma 1

where $\delta x^\gamma_0 =- g^{-1}_{b_0*} \delta g^\gamma_{b_0}(x_0).$

Figures (6)

  • Figure 1: Divergence-kernel method for the linear responses and scores of the 1-dimensional SDE with multiplicative noise, $T=1$. The dots are $\log h_T$. Each short line is a linear response or score computed by the divergence-kernel algorithm, averaged on paths whose terminal states $x_T$ fall into the same subinterval. Left: linear responses $\delta \log h_T$. Right: scores $\nabla \log h_T$ .
  • Figure 2: Left: plot of $x^0_t, x^1_t$ from a typical orbit of time length $1.5$. Right: linear response computed by divergence-kernel method.
  • Figure 3: Histograms of data $\{y_k\}$ and samples $\{x_{T,l}\}$. Left to right: plots at $\gamma_0$, $\gamma_{2}$, and $\gamma_{10}$.
  • Figure 4: History of $|\gamma_n-\gamma_{true}|^2/N_\gamma$ for the 5D DK-SDE model.
  • Figure 5: Convergence history for the 20D DK-SDE model. From left to right: sweep over learning rate $\eta$, $N_{\mathrm{neighbor}}$, and number of samples $L$.
  • ...and 1 more figures

Theorems & Definitions (15)

  • lemma 1: Divergence formula for one-step linear response
  • proof
  • Definition 1
  • lemma 2: N-step forward divergence-kernel formula for score divKer
  • theorem 2: N-step divergence-kernel formula for linear responses
  • proof
  • proof : Derivation
  • Remark
  • proof : Derivation
  • proof : Derivation
  • ...and 5 more