Table of Contents
Fetching ...

Semi-Implicit Functional Gradient Flow for Efficient Sampling

Shiyue Zhang, Ziheng Cheng, Cheng Zhang

TL;DR

This work tackles efficient Bayesian sampling via particle-based variational inference by introducing Semi-Implicit Functional Gradient Flow (SIFG), which perturbs particles with Gaussian noise to promote exploration while estimating the Wasserstein gradient through denoising score matching. The method yields a stochastic gradient-flow reformulation that connects to a semi-implicit variational distribution, enabling end-to-end neural-network-based score estimation and scalable sampling in high dimensions. The authors provide optimization and statistical guarantees for SIFG, derive neural-network ERM generalization bounds under bounded-moment and sub-Gaussian assumptions, and introduce Ada-SIFG to automatically adapt the noise magnitude during sampling. Empirical results across Gaussian mixtures, heavy-tailed distributions, ICA, and Bayesian neural networks show that SIFG and especially Ada-SIFG offer improved exploration, faster convergence, and competitive accuracy compared to established ParVI methods, highlighting practical impact for scalable, nonparametric Bayesian inference.

Abstract

Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Although functional gradient flows have been introduced to expand the kernel space for better flexibility, the deterministic updating mechanism may limit exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family. We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees due to a higher-order smoothness brought to the approximation family via Gaussian perturbation. In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling, striking a good balance between exploration efficiency and approximation accuracy. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness and efficiency of the proposed framework.

Semi-Implicit Functional Gradient Flow for Efficient Sampling

TL;DR

This work tackles efficient Bayesian sampling via particle-based variational inference by introducing Semi-Implicit Functional Gradient Flow (SIFG), which perturbs particles with Gaussian noise to promote exploration while estimating the Wasserstein gradient through denoising score matching. The method yields a stochastic gradient-flow reformulation that connects to a semi-implicit variational distribution, enabling end-to-end neural-network-based score estimation and scalable sampling in high dimensions. The authors provide optimization and statistical guarantees for SIFG, derive neural-network ERM generalization bounds under bounded-moment and sub-Gaussian assumptions, and introduce Ada-SIFG to automatically adapt the noise magnitude during sampling. Empirical results across Gaussian mixtures, heavy-tailed distributions, ICA, and Bayesian neural networks show that SIFG and especially Ada-SIFG offer improved exploration, faster convergence, and competitive accuracy compared to established ParVI methods, highlighting practical impact for scalable, nonparametric Bayesian inference.

Abstract

Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Although functional gradient flows have been introduced to expand the kernel space for better flexibility, the deterministic updating mechanism may limit exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family. We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees due to a higher-order smoothness brought to the approximation family via Gaussian perturbation. In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling, striking a good balance between exploration efficiency and approximation accuracy. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness and efficiency of the proposed framework.

Paper Structure

This paper contains 39 sections, 29 theorems, 120 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

If the gradient of the transition kernel is skew-symmetric, i.e., $\nabla_x k(x,z)=-\nabla_z k(x,z)$, then the Wasserstein gradient flow of the energy functional $\mathcal{\hat{F}}$ is and the Wasserstein gradient is $\nabla_{W_2}\mathcal{\hat{F}}(\mu_t)(z)=-\mathbb{E}_{k(x,z)} \nabla\log \frac{\pi(x)}{\hat{\mu}_t(x)}=-\mathcal{K}\nabla\log \frac{\pi}{\mathcal{K}\mu_t}$.

Figures (5)

  • Figure 1: Comparison of the sampled particles from different methods at the 2000-th iteration (sufficient for convergence of all methods) against the ground truth samples on a 2D Gaussian mixture model.
  • Figure 2: The trajectory of particle movements during the first 2000 iterations for different methods. The red dot represents the initial location, the orange dot shows the particle location at iteration 1600 and the black dot shows the location at iteration 2000. We randomly selected 50 particles for illustration.
  • Figure 3: KL divergence of different methods versus the number of iterations. $\textbf{Left:}$ Gaussian mixture distribution. $\textbf{Right:}$ Monomial gamma distribution.
  • Figure 4: $\textbf{Left and Middle:}$ Amari distances of different methods on MEG dataset. On the left is the experiment for 10 particles and 50 random repetitions. In the middle is the experiment for 100 particles and 5 random repetitions. $\textbf{Right:}$ Test RMSE for BNN on Boston dataset. The number in parentheses specifies the initial value of $\sigma$.
  • Figure 5: The fifth moments of SIFG for BNN experiments on the six datasets.

Theorems & Definitions (45)

  • Theorem 3.1
  • Lemma 4.1
  • Proposition 4.6
  • Theorem 4.7
  • Theorem 4.8
  • Theorem 4.9
  • Corollary 4.10
  • Theorem 4.12
  • Theorem 4.13
  • Corollary 4.14
  • ...and 35 more