Table of Contents
Fetching ...

Variational Bayesian Inference with Stochastic Search

John Paisley, David Blei, Michael Jordan

TL;DR

This paper tackles the challenge of variational Bayesian inference with mean-field approximations when not all log-joint expectations are tractable. It introduces stochastic search variational Bayes (SSVB), which directly optimizes the variational lower bound $\mathcal{L}$ using unbiased Monte Carlo gradients and variance reduction via control variates, rather than relying on tightened bounds. The authors demonstrate the method on two nonconjugate models—logistic regression and a finite HDP approximation—showing that suitable control variates, including a second-order Taylor (delta) method, substantially reduce gradient variance and improve the objective with fewer samples. The proposed framework generalizes MFVB inference to nonconjugate settings and can integrate existing bounds as control variates, offering a scalable, flexible approach for Bayesian learning with large or intractable models.

Abstract

Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which is typically handled by using a lower bound. We present an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound. This method uses control variates to reduce the variance of the stochastic search gradient, in which existing lower bounds can play an important role. We demonstrate the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

Variational Bayesian Inference with Stochastic Search

TL;DR

This paper tackles the challenge of variational Bayesian inference with mean-field approximations when not all log-joint expectations are tractable. It introduces stochastic search variational Bayes (SSVB), which directly optimizes the variational lower bound using unbiased Monte Carlo gradients and variance reduction via control variates, rather than relying on tightened bounds. The authors demonstrate the method on two nonconjugate models—logistic regression and a finite HDP approximation—showing that suitable control variates, including a second-order Taylor (delta) method, substantially reduce gradient variance and improve the objective with fewer samples. The proposed framework generalizes MFVB inference to nonconjugate settings and can integrate existing bounds as control variates, offering a scalable, flexible approach for Bayesian learning with large or intractable models.

Abstract

Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which is typically handled by using a lower bound. We present an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound. This method uses control variates to reduce the variance of the stochastic search gradient, in which existing lower bounds can play an important role. We demonstrate the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

Paper Structure

This paper contains 11 sections, 23 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Approximation error between$\ln \sigma(\theta)$ and the two control variates considered. The mean and variance of $q$ used in these examples are (left) $\mu=3, \sigma^{2}=3$ and (right) $\mu=-5, \sigma^{2}=1$. We show 100 samples from these $q$ distributions, at which points the functions would be evaluated for the stochastic gradient (for $a=1$ ). The Taylor expansion is closer to the true function at the region of interest as defined by $q$. The benefit of this is that fewer samples will be necessary to approximate the gradient.
  • Figure 2: (left) The intractable function in the HDP. (right) the difference after introducing a control variate and setting$a=1$. Since $-\ln \Gamma(\beta \theta)-\ln \beta \theta=-\ln \Gamma(\beta \theta+1)$, the right figure is the left figure shifted by one unit to the left and truncated at zero. The very large variance near zero (where most values of $\beta \theta$ will lie) has been significantly reduced. For larger values of $\beta \theta$, we use a first-order Taylor approximation at $\beta \mathbb{E}_{q} \theta$ of the nearly linear function.
  • Figure 3: Experimental results for variational logistic regression. We compare the variance reduction obtained by the two control variates under consideration. (top row) The number of samples per iteration setting$\epsilon=0.1$ in Algorithm 1. The yellow and black lines represent the estimated number that would be required without variance reduction according to each control variate. As expected, these curves overlap. (middle row) The variance reduction factor of Eq. (11). The selected control variates significantly reduce the variance. The second-order Taylor control variate is significantly better than the lower bound. (bottom row) The optimal scaling factor estimated from samples.
  • Figure 4: Average number of samples per iteration for the two equivalent gradient approximations,$\nabla_{c} \ln q$ vs $\nabla_{c} \ln q_{k}^{\prime}$, where $q$ is the Dirichlet and $q_{k}^{\prime}$ the beta distribution. Sampling is further reduced (see text for discussion).