Table of Contents
Fetching ...

Radon--Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions

Elias Hess-Childs, Dejan Slepčev, Lantian Xu

TL;DR

The paper introduces Radon--Wasserstein (RW) and Regularized Radon--Wasserstein (RRW) gradient flows for KL divergence, enabling scalable high-dimensional sampling by basing velocities on one-dimensional projections. It develops Kernel-Density RW (KDRW) and RRW flows and their interacting-particle discretizations, with FFT-based accelerations that yield per-step costs of $O(nd)$. The authors establish well-posedness, stability, mean-field convergence, SGD convergence, and long-time convergence for RRW, and provide extensive experiments on Gaussian and Rosenbrock ('banana') targets, showing accurate sampling in high dimensions and favorable quantization relative to i.i.d. samples and SVGD. The framework offers a dimension-robust alternative to traditional MCMC/variational approaches, combining theoretical guarantees with practical efficiency for high-dimensional interacting-particle sampling.

Abstract

Gradient flows of the Kullback--Leibler (KL) divergence, such as the Fokker--Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with a remarkable combination of properties: they admit accurate interacting-particle approximations in high dimensions, and the per-step cost scales linearly in both the number of particles and the dimension. These gradient flows are based on new transportation-based Riemannian geometries on the space of probability measures: the Radon--Wasserstein geometry and the related Regularized Radon--Wasserstein (RRW) geometry. We define these geometries using the Radon transform so that the gradient-flow velocities depend only on one-dimensional projections. This yields interacting-particle-based algorithms whose per-step cost follows from efficient Fast Fourier Transform-based evaluation of the required 1D convolutions. We additionally provide numerical experiments that study the performance of the proposed algorithms and compare convergence behavior and quantization. Finally, we prove some theoretical results including well-posedness of the flows and long-time convergence guarantees for the RRW flow.

Radon--Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions

TL;DR

The paper introduces Radon--Wasserstein (RW) and Regularized Radon--Wasserstein (RRW) gradient flows for KL divergence, enabling scalable high-dimensional sampling by basing velocities on one-dimensional projections. It develops Kernel-Density RW (KDRW) and RRW flows and their interacting-particle discretizations, with FFT-based accelerations that yield per-step costs of . The authors establish well-posedness, stability, mean-field convergence, SGD convergence, and long-time convergence for RRW, and provide extensive experiments on Gaussian and Rosenbrock ('banana') targets, showing accurate sampling in high dimensions and favorable quantization relative to i.i.d. samples and SVGD. The framework offers a dimension-robust alternative to traditional MCMC/variational approaches, combining theoretical guarantees with practical efficiency for high-dimensional interacting-particle sampling.

Abstract

Gradient flows of the Kullback--Leibler (KL) divergence, such as the Fokker--Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with a remarkable combination of properties: they admit accurate interacting-particle approximations in high dimensions, and the per-step cost scales linearly in both the number of particles and the dimension. These gradient flows are based on new transportation-based Riemannian geometries on the space of probability measures: the Radon--Wasserstein geometry and the related Regularized Radon--Wasserstein (RRW) geometry. We define these geometries using the Radon transform so that the gradient-flow velocities depend only on one-dimensional projections. This yields interacting-particle-based algorithms whose per-step cost follows from efficient Fast Fourier Transform-based evaluation of the required 1D convolutions. We additionally provide numerical experiments that study the performance of the proposed algorithms and compare convergence behavior and quantization. Finally, we prove some theoretical results including well-posedness of the flows and long-time convergence guarantees for the RRW flow.
Paper Structure (35 sections, 20 theorems, 164 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 20 theorems, 164 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Proposition 2.3

If $f\in L^1(\mathbb{R}^d)$ and $g\in L^\infty(\mathbb{D}^d)$, then

Figures (7)

  • Figure 1.1: Trajectories of the Regularized Radon--Wasserstein (RRW) gradient flow of the KL divergence \ref{['eq:RRWgf']} for a $2$-dimensional Rosenbrock "banana" target distribution with potential $U(x)=\frac{1}{2} ( x_1^2 +10(x_2 - 0.4x_1^2)^2)$. Trajectories were generated using Algorithm \ref{['algo:skeleton']} with Routine \ref{['rout:KDRW_fft']} and $n=50$ particles. The initial particles were i.i.d. sampled from a centered Gaussian with covariance matrix $0.8I_2$.
  • Figure 4.1: Samples from a banana distribution with potential \ref{['eq:banana_potential']} generated by both KDRW and KDRW _ fft with different bandwidths $b$ and $n=1024$ particles. Initial particles were sampled with the same i.i.d. sample from a standard Gaussian.
  • Figure 4.2: $\mathop{\mathrm{MMD}}\nolimits^2$ error for samples generated with different bandwidths $b$ and $n=1024$ particles. The target distribution is a standard Gaussian in the top row, and a Rosenbrock "banana" distribution with potential \ref{['eq:banana_potential']} in the bottom row.
  • Figure 4.3: Time per step (seconds) for RRW _ fft on CPU vs. GPU for different particle numbers $n$ and dimensions $d$.
  • Figure 4.4: $\mathop{\mathrm{MMD}}\nolimits^2$ error versus algorithmic time for a Rosenbrock "banana" target distribution with potential \ref{['eq:banana_potential']} for different particle numbers $n$ and dimensions $d$.
  • ...and 2 more figures

Theorems & Definitions (47)

  • Definition 2.1
  • Definition 2.2
  • Proposition 2.3
  • Definition 2.4
  • Definition 2.5
  • Remark 2.6
  • Remark 2.7
  • Definition 2.8
  • Definition 2.9
  • Remark 2.10
  • ...and 37 more