Table of Contents
Fetching ...

On the Asymptotics of Importance Weighted Variational Inference

Badr-Eddine Cherief-Abdellatif, Randal Douc, Arnaud Doucet, Hugo Marival

TL;DR

This work provides the first rigorous asymptotic theory for Importance Weighted Variational Inference (IWVI). It establishes consistency for both the model parameter $\tilde{\theta}_n^k$ and the variational parameter $\tilde{\phi}_n^k$, with the latter converging to a variance-minimizing target, under weak moment conditions. It further proves asymptotic normality and efficiency of $\tilde{\theta}_n^k$ when the MC sample size $k$ grows fast enough relative to the data size $n$, revealing a phase transition in the required growth rate between $\sqrt{n}$ and $n$ depending on the smoothness of the importance weights via the reparameterization framework. Theoretical results are complemented by simulations that illustrate how IWVI can closely approximate the MLE and outperform MSLE variants under certain sampling regimes. Overall, the paper provides foundational guarantees for IWVI in large-sample and large-$k$ settings, underpinning its empirical success with solid asymptotic theory.

Abstract

For complex latent variable models, the likelihood function is not available in closed form. In this context, a popular method to perform parameter estimation is Importance Weighted Variational Inference. It essentially maximizes the expectation of the logarithm of an importance sampling estimate of the likelihood with respect to both the latent variable model parameters and the importance distribution parameters, the expectation being itself with respect to the importance samples. Despite its great empirical success in machine learning, a theoretical analysis of the limit properties of the resulting estimates is still lacking. We fill this gap by establishing consistency when both the Monte Carlo and the observed data sample sizes go to infinity simultaneously. We also establish asymptotic normality and efficiency under additional conditions relating the rate of growth between the Monte Carlo and the observed data samples sizes. We distinguish several regimes related to the smoothness of the importance ratio.

On the Asymptotics of Importance Weighted Variational Inference

TL;DR

This work provides the first rigorous asymptotic theory for Importance Weighted Variational Inference (IWVI). It establishes consistency for both the model parameter and the variational parameter , with the latter converging to a variance-minimizing target, under weak moment conditions. It further proves asymptotic normality and efficiency of when the MC sample size grows fast enough relative to the data size , revealing a phase transition in the required growth rate between and depending on the smoothness of the importance weights via the reparameterization framework. Theoretical results are complemented by simulations that illustrate how IWVI can closely approximate the MLE and outperform MSLE variants under certain sampling regimes. Overall, the paper provides foundational guarantees for IWVI in large-sample and large- settings, underpinning its empirical success with solid asymptotic theory.

Abstract

For complex latent variable models, the likelihood function is not available in closed form. In this context, a popular method to perform parameter estimation is Importance Weighted Variational Inference. It essentially maximizes the expectation of the logarithm of an importance sampling estimate of the likelihood with respect to both the latent variable model parameters and the importance distribution parameters, the expectation being itself with respect to the importance samples. Despite its great empirical success in machine learning, a theoretical analysis of the limit properties of the resulting estimates is still lacking. We fill this gap by establishing consistency when both the Monte Carlo and the observed data sample sizes go to infinity simultaneously. We also establish asymptotic normality and efficiency under additional conditions relating the rate of growth between the Monte Carlo and the observed data samples sizes. We distinguish several regimes related to the smoothness of the importance ratio.
Paper Structure (20 sections, 12 theorems, 97 equations, 2 figures, 1 table)

This paper contains 20 sections, 12 theorems, 97 equations, 2 figures, 1 table.

Key Result

Theorem 1

Assume hyp:theta:star-hyp:q-hyp:unif. Then, $\mathbb P_\star-a.s.$,

Figures (2)

  • Figure 1: Boxplots of MSLE and IWVI estimates over 500 replications of the draw of the latent set of size $k$ based on Cameron & Travedi's dataset with $n = 100$ observations. The dashed line represents the maximum likelihood value. One can see that increasing the number of draws $k$ improves the estimation accuracy. Note that the scales are different in the three figures.
  • Figure 2: Boxplots of MSLE and IWVI estimates over 500 replications of the latent set of size $k$ based on Cameron & Travedi's dataset with $n = 100$ observations. The dashed line represents the maximum likelihood value, and the red points the mean of each boxplot. One can see that the three estimators are centered at the MLE for $k=1$.

Theorems & Definitions (23)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • proof
  • proof
  • Lemma 2
  • proof
  • proof
  • Proposition 1
  • ...and 13 more