Forward $χ^2$ Divergence Based Variational Importance Sampling
Chengrui Li, Yule Wang, Weihan Li, Anqi Wu
TL;DR
The paper introduces variational importance sampling (VIS), a method that directly optimizes the marginal log-likelihood $\ln p(\bm x;\theta)$ for latent-variable models by using an optimal proposal distribution that minimizes the forward $\chi^2$ divergence. VIS yields a tighter log-likelihood estimator than the ELBO as the Monte Carlo budget grows, and provides a numerically stable gradient framework in log-space to update the proposal. The authors demonstrate VIS across toy mixtures, variational auto-encoders, and partially observable GLMs, including synthetic and real neural datasets, showing consistent improvements in LL, CLL, and HLL as well as superior parameter recovery. The work offers a practical, statistically principled alternative to VI and related IS-based methods, with potential broad impact for learning in complex latent-variable models where posterior ambiguity is a challenge.
Abstract
Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $χ^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.
