Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

Fuqun Han; Stanley Osher; Wuchen Li

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

Fuqun Han, Stanley Osher, Wuchen Li

TL;DR

The paper develops and analyzes BRWP, a deterministic, semi-implicit method for sampling from strongly log-concave distributions by discretizing the probability flow ODE using a kernel derived from the regularized Wasserstein proximal operator. It proves a second-order weak accuracy of the kernel, uniform local regularity of BRWP iterates, and a KL-divergence contraction per step, yielding explicit step-size and mixing-time bounds. The work also discusses practical score-approximation strategies and demonstrates improved convergence and reduced bias compared to ULA and proximal Langevin in numerical experiments. Overall, BRWP provides a stable, efficient alternative to traditional Langevin-type samplers with rigorous convergence guarantees under the stated regularity assumptions.

Abstract

In this work, we investigate the convergence properties of the backward regularized Wasserstein proximal (BRWP) method for sampling a target distribution. The BRWP approach can be shown as a semi-implicit time discretization for a probability flow ODE with the score function whose density satisfies the Fokker-Planck equation of the overdamped Langevin dynamics. Specifically, the evolution of the density, hence the score function, is approximated via a kernel representation derived from the regularized Wasserstein proximal operator. By applying the dual formulation and a localized Taylor series to obtain the asymptotic expansion of this kernel formula, we establish guaranteed convergence in terms of the Kullback-Leibler divergence for the BRWP method towards a strongly log-concave target distribution. Our analysis also identifies the optimal and maximum step sizes for convergence. Furthermore, we demonstrate that the deterministic and semi-implicit BRWP scheme outperforms many classical Langevin Monte Carlo methods, such as the Unadjusted Langevin Algorithm (ULA), by offering faster convergence and reduced bias. Numerical experiments further validate the convergence analysis of the BRWP method.

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

TL;DR

Abstract

Paper Structure (32 sections, 31 theorems, 430 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 32 sections, 31 theorems, 430 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Review on Probability Flow ODE with Score Function, Regularized Wasserstein Proximal Operator, and BRWP Algorithm
Sampling problem
Discrete time approximation
Regularized Wasserstein proximal operator and kernel formula
BRWP algorithm
Kernel Approximation and Regularity of the BRWP Update
Weak second-order kernel approximation of the Fokker-Planck flow
Uniform local regularity of the BRWP iterates
Weak error of the kernel-based score oracle
Convergence Analysis of the BRWP Update in KL Divergence
Weak one-step expansion of the BRWP density
One-step decay of the KL divergence
Convergence of KL divergence and the mixing time
Estimation of the Score Function and Practical Considerations
...and 17 more sections

Key Result

Theorem 1

(Informal, see Theorem thm:PV-weak) For fixed test function $\varphi\in C^{2,1}(U)$ and stepsize $h>0$, suppose $\rho_0$ satisfies the Fokker-Planck equation at time $t_0$. Then we have where the constant depends on $V$ through its derivative up to order $3$ and the local domain $U$.

Figures (4)

Figure 1: Evolution of the density function (blue) with \ref{['rho_T_BRWP']} for different stepsizes $h$ for the first dimension. The initial density is $\mathcal{N}(0,2)$. The target density is a mixed Gaussian (red).
Figure 2: Evolution of the density function (blue) for the first dimension with \ref{['rho_T_BRWP']} as the target density with a mixture of $L_1$ and $L_{1/2}$ norms (red).
Figure 3: Histogram of 500 particles after 50 iterations in the first dimension for a Gaussian mixture distribution with $h = 0.02$.
Figure 4: Histogram of $500$ particles in the first dimension for the mixture of Gaussian and Laplace distributions with $h = 0.02$.

Theorems & Definitions (31)

Theorem 1
Theorem 2
Lemma 3: Local weak $O(h^2)$ expansion
Theorem 4: Global weak $O(h^2)$ expansion with unbounded supports
Theorem 5: Weak score oracle with propagated base error
Lemma 6: Lifting lemma for localized resolvent in $C^{3,1}$
Theorem 7: Uniform local $C^{3,1}$ bound for the BRWP iterates
Theorem 8: Weak score-oracle
Lemma 9: Weak expansion of the BRWP density update
Lemma 10: KL contraction per step
...and 21 more

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

TL;DR

Abstract

Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (31)