Table of Contents
Fetching ...

Sampling via Gradient Flows in the Space of Probability Measures

Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

TL;DR

This work develops a gradient-flow framework for sampling probability distributions known only up to normalization, showing that the KL divergence is the unique energy functional yielding normalization-constant–independent flows among continuously differentiable f-divergences. It analyzes invariance properties, proving the Fisher-Rao gradient flow achieves uniform exponential convergence due to its diffeomorphism invariance, and introduces affine-invariant Wasserstein and Stein gradient flows to improve numerical performance on anisotropic targets. It further develops Gaussian approximations via metric projection and moment closures, establishes their equivalence under a condition, and derives explicit Gaussian-flow dynamics with convergence analyses across Gaussian, log-concave, and general posteriors. The methods are demonstrated on a Bayesian Darcy flow inverse problem, where affine-invariant and Gaussian-approximate schemes show strong performance and practical scalability, while also illuminating limitations for non-Gaussian, curved-manifold targets.

Abstract

Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.

Sampling via Gradient Flows in the Space of Probability Measures

TL;DR

This work develops a gradient-flow framework for sampling probability distributions known only up to normalization, showing that the KL divergence is the unique energy functional yielding normalization-constant–independent flows among continuously differentiable f-divergences. It analyzes invariance properties, proving the Fisher-Rao gradient flow achieves uniform exponential convergence due to its diffeomorphism invariance, and introduces affine-invariant Wasserstein and Stein gradient flows to improve numerical performance on anisotropic targets. It further develops Gaussian approximations via metric projection and moment closures, establishes their equivalence under a condition, and derives explicit Gaussian-flow dynamics with convergence analyses across Gaussian, log-concave, and general posteriors. The methods are demonstrated on a Bayesian Darcy flow inverse problem, where affine-invariant and Gaussian-approximate schemes show strong performance and practical scalability, while also illuminating limitations for non-Gaussian, curved-manifold targets.

Abstract

Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
Paper Structure (60 sections, 16 theorems, 191 equations, 12 figures)

This paper contains 60 sections, 16 theorems, 191 equations, 12 figures.

Key Result

Proposition 2.2

Assume that $\rho_{\rm post}$ is $\alpha$-strongly logconcave: $\log \rho_{\rm post} \in C^2(\mathbb{R}^{N_{\theta}})$ and Then, for all $t\geq 0$, it holds that

Figures (12)

  • Figure 1: Gaussian posterior case: convergence of different gradient flows in terms of the $L^2$ error of $\mathbb{E}[\theta_t]$, the relative Frobenius norm error of the covariance $\frac{\lVert \mathrm{Cov}[\theta_t] - \mathrm{Cov}[\theta_{{\rm true}}]\rVert_F}{\lVert \mathrm{Cov}[\theta_{\rm true}]\rVert_F}$, and the error of $\mathbb{E}[\cos(\omega^T \theta_t + b)]$.
  • Figure 2: Logconcave posterior case: convergence of different gradient flows in terms of the $L^2$ error of $\mathbb{E}[\theta_t]$, the relative Frobenius norm error of the covariance $\frac{\lVert \mathrm{Cov}[\theta_t] - \mathrm{Cov}[\theta_{{\rm true}}]\rVert_F}{\lVert \mathrm{Cov}[\theta_{\rm true}]\rVert_F}$, and the error of $\mathbb{E}[\cos(\omega^T \theta_t + b)]$.
  • Figure 3: General posterior case: particles obtained by different gradient flows at $t=15$. Grey lines represent the contour of the true posterior.
  • Figure 4: General posterior case: convergence of different gradient flows in terms of the $L^2$ error of $\mathbb{E}[\theta_t]$, the relative Frobenius norm error of the covariance $\frac{\lVert \mathrm{Cov}[\theta_t] - \mathrm{Cov}[\theta_{{\rm true}}]\rVert_F}{\lVert \mathrm{Cov}[\theta_{\rm true}]\rVert_F}$, and the error of $\mathbb{E}[\cos(\omega^T \theta_t + b)]$.
  • Figure 5: Gaussian posterior case: convergence of different dynamics in terms of $L^2$ error of $\mathbb{E}[\theta_t]$, the relative Frobenius norm error of the covariance $\frac{\lVert \mathrm{Cov}[\theta_t] - \mathrm{Cov}[\theta_{{\rm true}}]\rVert_F}{\lVert \mathrm{Cov}[\theta_{\rm true}]\rVert_F}$, and the error of $\mathbb{E}[\cos(\omega^T \theta_t + b)]$.
  • ...and 7 more figures

Theorems & Definitions (38)

  • Example 2.1
  • Proposition 2.2
  • Theorem 3.1
  • Remark 3.2
  • Theorem 4.1
  • Definition 4.2: Affine Invariant Gradient Flow
  • Definition 4.3: Affine Invariant Mean-Field Dynamics
  • Remark 4.5
  • Theorem 4.6
  • Theorem 4.7
  • ...and 28 more