Table of Contents
Fetching ...

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

Fuqun Han, Stanley Osher, Wuchen Li

TL;DR

The paper develops and analyzes BRWP, a deterministic, semi-implicit method for sampling from strongly log-concave distributions by discretizing the probability flow ODE using a kernel derived from the regularized Wasserstein proximal operator. It proves a second-order weak accuracy of the kernel, uniform local regularity of BRWP iterates, and a KL-divergence contraction per step, yielding explicit step-size and mixing-time bounds. The work also discusses practical score-approximation strategies and demonstrates improved convergence and reduced bias compared to ULA and proximal Langevin in numerical experiments. Overall, BRWP provides a stable, efficient alternative to traditional Langevin-type samplers with rigorous convergence guarantees under the stated regularity assumptions.

Abstract

In this work, we investigate the convergence properties of the backward regularized Wasserstein proximal (BRWP) method for sampling a target distribution. The BRWP approach can be shown as a semi-implicit time discretization for a probability flow ODE with the score function whose density satisfies the Fokker-Planck equation of the overdamped Langevin dynamics. Specifically, the evolution of the density, hence the score function, is approximated via a kernel representation derived from the regularized Wasserstein proximal operator. By applying the dual formulation and a localized Taylor series to obtain the asymptotic expansion of this kernel formula, we establish guaranteed convergence in terms of the Kullback-Leibler divergence for the BRWP method towards a strongly log-concave target distribution. Our analysis also identifies the optimal and maximum step sizes for convergence. Furthermore, we demonstrate that the deterministic and semi-implicit BRWP scheme outperforms many classical Langevin Monte Carlo methods, such as the Unadjusted Langevin Algorithm (ULA), by offering faster convergence and reduced bias. Numerical experiments further validate the convergence analysis of the BRWP method.

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

TL;DR

The paper develops and analyzes BRWP, a deterministic, semi-implicit method for sampling from strongly log-concave distributions by discretizing the probability flow ODE using a kernel derived from the regularized Wasserstein proximal operator. It proves a second-order weak accuracy of the kernel, uniform local regularity of BRWP iterates, and a KL-divergence contraction per step, yielding explicit step-size and mixing-time bounds. The work also discusses practical score-approximation strategies and demonstrates improved convergence and reduced bias compared to ULA and proximal Langevin in numerical experiments. Overall, BRWP provides a stable, efficient alternative to traditional Langevin-type samplers with rigorous convergence guarantees under the stated regularity assumptions.

Abstract

In this work, we investigate the convergence properties of the backward regularized Wasserstein proximal (BRWP) method for sampling a target distribution. The BRWP approach can be shown as a semi-implicit time discretization for a probability flow ODE with the score function whose density satisfies the Fokker-Planck equation of the overdamped Langevin dynamics. Specifically, the evolution of the density, hence the score function, is approximated via a kernel representation derived from the regularized Wasserstein proximal operator. By applying the dual formulation and a localized Taylor series to obtain the asymptotic expansion of this kernel formula, we establish guaranteed convergence in terms of the Kullback-Leibler divergence for the BRWP method towards a strongly log-concave target distribution. Our analysis also identifies the optimal and maximum step sizes for convergence. Furthermore, we demonstrate that the deterministic and semi-implicit BRWP scheme outperforms many classical Langevin Monte Carlo methods, such as the Unadjusted Langevin Algorithm (ULA), by offering faster convergence and reduced bias. Numerical experiments further validate the convergence analysis of the BRWP method.
Paper Structure (32 sections, 31 theorems, 430 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 32 sections, 31 theorems, 430 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

(Informal, see Theorem thm:PV-weak) For fixed test function $\varphi\in C^{2,1}(U)$ and stepsize $h>0$, suppose $\rho_0$ satisfies the Fokker-Planck equation at time $t_0$. Then we have where the constant depends on $V$ through its derivative up to order $3$ and the local domain $U$.

Figures (4)

  • Figure 1: Evolution of the density function (blue) with \ref{['rho_T_BRWP']} for different stepsizes $h$ for the first dimension. The initial density is $\mathcal{N}(0,2)$. The target density is a mixed Gaussian (red).
  • Figure 2: Evolution of the density function (blue) for the first dimension with \ref{['rho_T_BRWP']} as the target density with a mixture of $L_1$ and $L_{1/2}$ norms (red).
  • Figure 3: Histogram of 500 particles after 50 iterations in the first dimension for a Gaussian mixture distribution with $h = 0.02$.
  • Figure 4: Histogram of $500$ particles in the first dimension for the mixture of Gaussian and Laplace distributions with $h = 0.02$.

Theorems & Definitions (31)

  • Theorem 1
  • Theorem 2
  • Lemma 3: Local weak $O(h^2)$ expansion
  • Theorem 4: Global weak $O(h^2)$ expansion with unbounded supports
  • Theorem 5: Weak score oracle with propagated base error
  • Lemma 6: Lifting lemma for localized resolvent in $C^{3,1}$
  • Theorem 7: Uniform local $C^{3,1}$ bound for the BRWP iterates
  • Theorem 8: Weak score-oracle
  • Lemma 9: Weak expansion of the BRWP density update
  • Lemma 10: KL contraction per step
  • ...and 21 more