Table of Contents
Fetching ...

Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

Thomas Guilmeau, Hadrien Hendrikx, Florence Forbes

Abstract

Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form $\mathcal{O}\left(\frac{1}{T^ρ} \right)$, possibly with $ρ\geq 1$, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.

Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

Abstract

Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form , possibly with , for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.

Paper Structure

This paper contains 52 sections, 17 theorems, 85 equations, 7 figures, 1 algorithm.

Key Result

Proposition 1

Suppose that $\mathop{\mathrm{int}}\limits \mathop{\mathrm{dom}}\limits A \neq \emptyset$. Then, Furthermore, if $\mathcal{Q}$ is minimal and steep, then $\blacktriangleleft$$\blacktriangleleft$

Figures (7)

  • Figure 1: Mean Bregman divergence between current and optimal parameters, over $100$ runs, for different NGVI schedules in Prop.\ref{['prop:varianceBonnetPrice']} (a,b) and \ref{['prop:boundVarianceSubsampling']} (c,d) settings.
  • Figure 2: Logistic regression: Average ELBO over $50$ runs, for different NGVI schedules.
  • Figure 3: Mean Bregman divergence between current and optimal parameters, over $100$ runs, for NGVI schedules with constant $\eta$ and constant sample size $N$
  • Figure 4: Mean Bregman divergence between current and optimal parameters, over $100$ runs, for NGVI schedules with constant $\eta$ and increasing sample size $N_t = (t+1)^\gamma$
  • Figure 5: Student linear regression: Averaged ELBO over $50$ runs, for different NGVI schedules, comparing for each schedule the algorithm with and without a projection step
  • ...and 2 more figures

Theorems & Definitions (41)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Definition 3
  • Proposition 2
  • Definition 4
  • Definition 5
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • ...and 31 more