Table of Contents
Fetching ...

Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound

Kamélia Daudel, François Roueff

TL;DR

Two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound are provided, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies.

Abstract

Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational Rényi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. Recently, the VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In this paper, we provide two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we illustrate our theoretical findings empirically.

Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound

TL;DR

Two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound are provided, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies.

Abstract

Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational Rényi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. Recently, the VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In this paper, we provide two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we illustrate our theoretical findings empirically.

Paper Structure

This paper contains 41 sections, 20 theorems, 135 equations, 7 figures.

Key Result

Theorem 1

Assume hyp:inverseGrad, hyp:momentGrad with $h>2$, hyp:momentGradREP with $h'>1$ and hyp:hypZeroREP. If $h'<2$ assume moreover that $2/h+1/h'<1$. Then, as $N\to\infty$,

Figures (7)

  • Figure 1: Plotted is $\mathrm{SNR}[\gradREP[M,N,d][\phi_k]]$ computed over 2000 Monte Carlo samples for the Gaussian example described in \ref{['num:GaussEx']} as a function of $N$, for varying values of $(\alpha,d, \epsilon)$ and a random coordinate $\phi_k$. The solid lines correspond to $\mathrm{SNR}[\gradREP[M,N,d][\phi_k]]$, while the dashed lines correspond to predictions of the form \ref{['eq:predictThmtwo']}.
  • Figure 2: Plotted is $\mathbb E(\gradDREP[M,N,d][\phi_k])$ computed over 2000 Monte Carlo samples for the Gaussian example described in \ref{['num:GaussEx']} as a function of $N$, for varying values of $(\alpha,d)$ and a random coordinate $\phi_k$. The solid lines correspond to $\mathbb E(\gradREP[M,N,d][\phi_k])$, while the dashed lines correspond to predictions of the form $y = \epsilon \alpha$.
  • Figure 3: Plotted is $\mathrm{SNR}[\gradREP[M,N,d][\tilde{b}_k]]$ computed over 2000 Monte Carlo samples for the Linear Gaussian example described in \ref{['num:LinGaussEx']} as a function of $N$, for varying values of $(\alpha,d, \epsilon)$ and a randomly selected datapoint $x$. The solid lines correspond to $\mathrm{SNR}[\gradREP[M,N,d][\tilde{b}_k]]$, while the dashed lines correspond to predictions of the form \ref{['eq:LeadOrderSNRone']}.
  • Figure 4: Plotted is $\mathrm{SNR}[\gradDREP[M,N,d][\tilde{b}_k]]$ computed over 2000 Monte Carlo samples for the Linear Gaussian example described in \ref{['num:LinGaussEx']} as a function of $N$, for varying values of $(\alpha,d, \epsilon)$ and a randomly selected datapoint $x$. The solid lines correspond to $\mathrm{SNR}[\gradDREP[M,N,d][\tilde{b}_k]]$, while the dashed lines correspond to predictions of the form \ref{['eq:LeadOrderSNRone']}.
  • Figure 5: Plotted are $\mathrm{SNR}[\gradREP[M,N,d][\phi_k][0]]$ and $\mathrm{SNR}[\gradDREP[M,N,d][\phi_k][0]]$ computed over 2000 Monte Carlo samples for the Gaussian example described in \ref{['num:GaussEx']} as a function of $N$ and with $\epsilon = 0.2$. The solid lines correspond to the SNRs, while the dashed lines correspond to predictions of the form \ref{['eq:predictSNRGaussExOne']} and \ref{['eq:predictSNRGaussExTwo']}.
  • ...and 2 more figures

Theorems & Definitions (43)

  • Example 1
  • Example 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Example 3: \ref{['ex:GaussPrev']} revisited
  • Example 4: \ref{['ex:LinGaussPrev']} revisited
  • Remark 1
  • Theorem 4
  • ...and 33 more