Semi-Implicit Functional Gradient Flow for Efficient Sampling
Shiyue Zhang, Ziheng Cheng, Cheng Zhang
TL;DR
This work tackles efficient Bayesian sampling via particle-based variational inference by introducing Semi-Implicit Functional Gradient Flow (SIFG), which perturbs particles with Gaussian noise to promote exploration while estimating the Wasserstein gradient through denoising score matching. The method yields a stochastic gradient-flow reformulation that connects to a semi-implicit variational distribution, enabling end-to-end neural-network-based score estimation and scalable sampling in high dimensions. The authors provide optimization and statistical guarantees for SIFG, derive neural-network ERM generalization bounds under bounded-moment and sub-Gaussian assumptions, and introduce Ada-SIFG to automatically adapt the noise magnitude during sampling. Empirical results across Gaussian mixtures, heavy-tailed distributions, ICA, and Bayesian neural networks show that SIFG and especially Ada-SIFG offer improved exploration, faster convergence, and competitive accuracy compared to established ParVI methods, highlighting practical impact for scalable, nonparametric Bayesian inference.
Abstract
Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Although functional gradient flows have been introduced to expand the kernel space for better flexibility, the deterministic updating mechanism may limit exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family. We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees due to a higher-order smoothness brought to the approximation family via Gaussian perturbation. In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling, striking a good balance between exploration efficiency and approximation accuracy. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness and efficiency of the proposed framework.
