Table of Contents
Fetching ...

Alternating Diffusion for Proximal Sampling with Zeroth Order Queries

Hirohane Takagi, Atsushi Nitanda

Abstract

This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the heat flow. The backward step was originally implemented by rejection sampling, whereas we directly simulate the dynamics. Unlike diffusion-based sampling methods that estimate scores via learned models or by invoking auxiliary samplers, our method treats the intermediate particle distribution as a Gaussian mixture, thereby yielding a Monte Carlo score estimator from directly samplable distributions. Theoretically, when the score estimation error is sufficiently controlled, our method inherits the exponential convergence of proximal sampling under isoperimetric conditions on the target distribution. In practice, the algorithm avoids rejection sampling, permits flexible step sizes, and runs with a deterministic runtime budget. Numerical experiments demonstrate that our approach converges rapidly to the target distribution, driven by interactions among multiple particles and by exploiting parallel computation.

Alternating Diffusion for Proximal Sampling with Zeroth Order Queries

Abstract

This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the heat flow. The backward step was originally implemented by rejection sampling, whereas we directly simulate the dynamics. Unlike diffusion-based sampling methods that estimate scores via learned models or by invoking auxiliary samplers, our method treats the intermediate particle distribution as a Gaussian mixture, thereby yielding a Monte Carlo score estimator from directly samplable distributions. Theoretically, when the score estimation error is sufficiently controlled, our method inherits the exponential convergence of proximal sampling under isoperimetric conditions on the target distribution. In practice, the algorithm avoids rejection sampling, permits flexible step sizes, and runs with a deterministic runtime budget. Numerical experiments demonstrate that our approach converges rapidly to the target distribution, driven by interactions among multiple particles and by exploiting parallel computation.
Paper Structure (32 sections, 9 theorems, 85 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 9 theorems, 85 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Assume that $\pi^X$ satisfies LSI with constant $C_\text{LSI}$. For any $h > 0$ and any initial distribution $\rho_0^X$, the $k$-th iterate $\rho_k^X$ of the proximal sampler with step size $h$ satisfies, for $q \ge 1$,

Figures (6)

  • Figure 1: Illustration of the ideal proximal sampling (left) and our approximation (right). Heat flow and reverse dynamics are defined between $\pi^X$ and $\pi^Y$, but applied to intermediate $\rho$. Although these do not reach their targets in one step, the ideal version attains exponential convergence. Compared to the rejection sampling-based implementation of proximal samplers, our approach allows for larger step sizes (i.e., stronger convolution), which reduce the iterations to reach the target distribution.
  • Figure 2: Convergence of estimated KL divergence, averaged over 10 random seeds with shaded areas indicating variances. Our method (orange) outperforms both the proximal sampler with RGO (blue) and an ablated variant of our algorithm without particle interactions (green). It achieves the same accuracy as RGO in about $10\times$ fewer iterations ($100\times$ faster when accounting for thinning).
  • Figure 3: One-dimensional marginals of $\pi^X$ along the third coordinate. The red curve is the ground-truth. Our method around 100 iterations (left, orange) already matches the RGO-based sampler at $\sim$1000 iterations (left, blue), and with only 200--300 iterations (right, orange) it closely aligns with the reference obtained from a sufficiently long run following liang2023a (right, blue).
  • Figure 4: Empirical distributions at $k=3,10,200$ iterations with seed $=0$ on the two-tori domain. In-and-Out (top row) finds $T_1$, which overlaps with the initial standard Gaussian distribution, but fails to reach $T_2$. Our method (bottom row) generates some particles outside the domain but gradually drives particles toward $T_2$, demonstrating its ability to explore both components.
  • Figure 5: Convergence of KL divergence for different step sizes $h$, with all other parameters set as Ours in Table \ref{['tab:exp-settings-1']}. Each curve shows the mean over 10 random seeds, with shaded areas indicating variances. Larger step sizes lead to faster convergence.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 1: pmlr-v178-chen22c, Theorem 3
  • Lemma 1: pmlr-v178-chen22c, Appendix A.4
  • Lemma 2: Discretization-only one-step bound
  • Proposition 1: Main one-step bound with split errors
  • Remark 1
  • Lemma 3
  • proof
  • Corollary 1
  • proof
  • Proposition 2: One-step bound for the diffusion-approximated proximal sampler without score estimation error
  • ...and 5 more