Bounding Evidence and Estimating Log-Likelihood in VAE

Łukasz Struski; Marcin Mazur; Paweł Batorski; Przemysław Spurek; Jacek Tabor

Bounding Evidence and Estimating Log-Likelihood in VAE

Łukasz Struski, Marcin Mazur, Paweł Batorski, Przemysław Spurek, Jacek Tabor

TL;DR

The paper addresses the challenge of the variational gap between log-evidence and ELBO in VAE-like models by deriving general upper bounds for concave transforms $f(\mathbb{E} X)$ and refining them via importance sampling. The core contributions include a foundational bound $f(\mathbb{E} X) \le \mathbb{E}[f(X)+(Y-X)f'(X)]$, an additive IS-based tightening strategy with an optimal $C$, and improved bounds using a $g,h$-inequality framework, all specialized to $f=\log$ to estimate log-evidence. The authors provide theoretical guarantees, convergence properties, and practical procedures, then validate the approach through synthetic case studies and extensive VAE/IWAE experiments on MNIST, SVHN, and CelebA, showing favorable comparisons to CUBO, EUBO, and TVO bounds. While the method yields tighter bounds and useful model-evaluation tools, limitations include reliance on estimators and focus on VAEs, restricting immediate training-time applicability. Overall, the work advances principled quantification of the variational gap and offers a practical toolkit for comparing generative models trained with lower bounds.

Abstract

Many crucial problems in deep learning and statistical inference are caused by a variational gap, i.e., a difference between model evidence (log-likelihood) and evidence lower bound (ELBO). In particular, in a classical VAE setting that involves training via an ELBO cost function, it is difficult to provide a robust comparison of the effects of training between models, since we do not know a log-likelihood of data (but only its lower bound). In this paper, to deal with this problem, we introduce a general and effective upper bound, which allows us to efficiently approximate the evidence of data. We provide extensive theoretical and experimental studies of our approach, including its comparison to the other state-of-the-art upper bounds, as well as its application as a tool for the evaluation of models that were trained on various lower bounds.

Bounding Evidence and Estimating Log-Likelihood in VAE

TL;DR

The paper addresses the challenge of the variational gap between log-evidence and ELBO in VAE-like models by deriving general upper bounds for concave transforms

and refining them via importance sampling. The core contributions include a foundational bound

, an additive IS-based tightening strategy with an optimal

, and improved bounds using a

-inequality framework, all specialized to

to estimate log-evidence. The authors provide theoretical guarantees, convergence properties, and practical procedures, then validate the approach through synthetic case studies and extensive VAE/IWAE experiments on MNIST, SVHN, and CelebA, showing favorable comparisons to CUBO, EUBO, and TVO bounds. While the method yields tighter bounds and useful model-evaluation tools, limitations include reliance on estimators and focus on VAEs, restricting immediate training-time applicability. Overall, the work advances principled quantification of the variational gap and offers a practical toolkit for comparing generative models trained with lower bounds.

Abstract

Paper Structure (24 sections, 13 theorems, 67 equations, 6 figures, 2 tables)

This paper contains 24 sections, 13 theorems, 67 equations, 6 figures, 2 tables.

INTRODUCTION
RELATED WORK
THEORETICAL STUDY
Variational Gap
Reducing Variational Gap
Improved Bounds for Variational Gap
Quality of Estimations
EXPERIMENTS
Case Study on Synthetic Data
Experiments for VAE and IWAE Models
Dependence of Variational Gap Bound on $C$
CONCLUSION
Limitation
Societal Impact
MISSING PROOFS
...and 9 more sections

Key Result

Theorem 1

Let $f$ be a smooth concave function. Then where $X$ and $Y$ are two independent random variables with the same distribution.

Figures (6)

Figure 1: Estimated size of various variational gap bounds for the evidence of data (lower is better), calculated for VAE, IWAE-5, and IWAE-10 models, previously trained on MNIST, SVHN, and CelebA datasets, vs. the number of latent samples. All computations were averaged over 3 collections of samples, and over the test dataset.
Figure 2: Behavior of lower and the upper bounds for the evidence vs. the number of samples from the latent.
Figure 3: Behavior of lower and upper bounds for the evidence of data, calculated by our method for VAE, IWAE-5, and IWAE-10 models, previously trained on MNIST, SVHN, and CelebA datasets, vs. the number of latent samples. All computations were averaged over 3 collections of samples, and over the test dataset.
Figure 4: Behavior of lower and upper bounds for the evidence of data, calculated without 1% outliers by our method for VAE, IWAE-5, and IWAE-10 models, previously trained on SVHN and CelebA datasets, vs. the number of latent samples. All computations were averaged over 3 collections of samples, and over the test dataset.
Figure 5: Estimated size of the proposed variational gap bound, calculated using \ref{['eq:isvgb']} for VAE, IWAE-5, and IWAE-10 models, previously trained on MNIST, SVHN, and CelebA datasets, vs. constant $C$. In each case, the optimal value $C^{\mathrm{opt}}$ is marked with a short vertical line. All computations were averaged over the test dataset.
...and 1 more figures

Theorems & Definitions (23)

Theorem 1
Corollary 1
Remark 1
Theorem 2
proof
Corollary 2
proof
Corollary 3
Remark 2
Proposition 1
...and 13 more

Bounding Evidence and Estimating Log-Likelihood in VAE

TL;DR

Abstract

Bounding Evidence and Estimating Log-Likelihood in VAE

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (23)