Kernel Semi-Implicit Variational Inference
Ziheng Cheng, Longlin Yu, Tianyu Xie, Shiyue Zhang, Cheng Zhang
TL;DR
Kernel Semi-Implicit Variational Inference (KSIVI) tackles intractable densities in semi-implicit variational families by deriving an explicit RKHS-based solution to the inner optimization, turning training into minimizing the kernel Stein discrepancy $KSD(q_phi || p)$. By leveraging the hierarchical form $q_phi(x)=\int q_phi(x|z) q(z) dz$ and a kernel trick, KSIVI avoids inner-loop optimization while maintaining a computable objective and unbiased MC gradient estimates. The authors prove a gradient variance bound and establish convergence to a stationary point under mild smoothness and moment assumptions, with extensive experiments on toy distributions, Bayesian logistic regression, conditioned diffusion processes, and Bayesian neural networks. Empirically KSIVI achieves competitive or superior performance to SIVI-SM with improved stability and less hyperparameter tuning, underscoring its practical relevance for scalable Bayesian inference in complex models.
Abstract
Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation, albeit requiring an additional lower-level optimization. In this paper, we propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for lower-level optimization through kernel tricks. Specifically, we show that when optimizing over a reproducing kernel Hilbert space (RKHS), the lower-level problem has an explicit solution. This way, the upper-level objective becomes the kernel Stein discrepancy (KSD), which is readily computable for stochastic gradient descent due to the hierarchical structure of semi-implicit variational distributions. An upper bound for the variance of the Monte Carlo gradient estimators of the KSD objective is derived, which allows us to establish novel convergence guarantees of KSIVI. We demonstrate the effectiveness and efficiency of KSIVI on both synthetic distributions and a variety of real data Bayesian inference tasks.
