Table of Contents
Fetching ...

Tight Bounds for Jensen's Gap with Applications to Variational Inference

Marcin Mazur, Tadeusz Dziarmaga, Piotr Kościelniak, Łukasz Struski

TL;DR

This work develops general, higher-order bounds on Jensen's gap $JG(f,X)$, with a focus on exponential and logarithmic convex functions, backed by analytical results and empirical validation. It generalizes existing Taylor- and moment-based bounds, providing explicit formulas and coefficients that tighten bounds via $2k$-th order expansions and, in the log case, gamma and lognormal distributions as key examples. The authors integrate these bounds into the PAC-Bayes framework to yield data-dependent generalization insights and demonstrate a practical log-likelihood estimation procedure for variational models, validated on real-world data. Overall, the approach yields tighter, more informative bounds for variational inference and probabilistic modeling, while outlining limitations and societal considerations.

Abstract

Since its original formulation, Jensen's inequality has played a fundamental role across mathematics, statistics, and machine learning, with its probabilistic version highlighting the nonnegativity of the so-called Jensen's gap, i.e., the difference between the expectation of a convex function and the function at the expectation. Of particular importance is the case when the function is logarithmic, as this setting underpins many applications in variational inference, where the term variational gap is often used interchangeably. Recent research has focused on estimating the size of Jensen's gap and establishing tight lower and upper bounds under various assumptions on the underlying function and distribution, driven by practical challenges such as the intractability of log-likelihood in graphical models like variational autoencoders (VAEs). In this paper, we propose new, general bounds for Jensen's gap that accommodate a broad range of assumptions on both the function and the random variable, with special attention to exponential and logarithmic cases. We provide both analytical and empirical evidence for the performance of our method. Furthermore, we relate our bounds to the PAC-Bayes framework, providing new insights into generalization performance in probabilistic models.

Tight Bounds for Jensen's Gap with Applications to Variational Inference

TL;DR

This work develops general, higher-order bounds on Jensen's gap , with a focus on exponential and logarithmic convex functions, backed by analytical results and empirical validation. It generalizes existing Taylor- and moment-based bounds, providing explicit formulas and coefficients that tighten bounds via -th order expansions and, in the log case, gamma and lognormal distributions as key examples. The authors integrate these bounds into the PAC-Bayes framework to yield data-dependent generalization insights and demonstrate a practical log-likelihood estimation procedure for variational models, validated on real-world data. Overall, the approach yields tighter, more informative bounds for variational inference and probabilistic modeling, while outlining limitations and societal considerations.

Abstract

Since its original formulation, Jensen's inequality has played a fundamental role across mathematics, statistics, and machine learning, with its probabilistic version highlighting the nonnegativity of the so-called Jensen's gap, i.e., the difference between the expectation of a convex function and the function at the expectation. Of particular importance is the case when the function is logarithmic, as this setting underpins many applications in variational inference, where the term variational gap is often used interchangeably. Recent research has focused on estimating the size of Jensen's gap and establishing tight lower and upper bounds under various assumptions on the underlying function and distribution, driven by practical challenges such as the intractability of log-likelihood in graphical models like variational autoencoders (VAEs). In this paper, we propose new, general bounds for Jensen's gap that accommodate a broad range of assumptions on both the function and the random variable, with special attention to exponential and logarithmic cases. We provide both analytical and empirical evidence for the performance of our method. Furthermore, we relate our bounds to the PAC-Bayes framework, providing new insights into generalization performance in probabilistic models.

Paper Structure

This paper contains 22 sections, 6 theorems, 36 equations, 5 figures, 1 table.

Key Result

Theorem 1

If $f$ is a twice differentiable convex function, then we have the following inequalities: provided that appropriate finite expected values exist.

Figures (5)

  • Figure 1: Lower and upper bounds on Jensen's gap given by our method and that of lee2021further for $f(x)=\exp(\frac{1}{2}x)$ and $X\sim \text{Exponential}(1)$ (top), and for $f(x)=\exp(x)$ and $X\sim \text{Normal}(0,1)$ (bottom), vs. the maximum order of moments used in the calculations (i.e., $2k-1$ in the case of our method). The dashed lines between the points represent second-degree polynomial interpolation. It should be noted that the upper bounds provided in lee2021further are infinite.
  • Figure 2: Upper bounds on the cross-entropy loss for the model misspecification setting (top) and the perfect model setting (bottom). The experimental setup from NEURIPS2020_3ac48664 was applied (see Figure 2 and Appendix B therein). Note that higher order bounds (ours) systematically improve the tightness and more accurately reflect the true minimum.
  • Figure 3: Lower and upper bounds on Jensen's gap for $f=-\log$ and $X\sim \text{Lognormal}(\mu,\sigma)$ (left) or $X\sim \text{Gamma}(a,\theta)$ (right), vs. different values of the respective distribution parameters, rigorously computed using our method and the method of pmlr-v206-struski23a.
  • Figure 4: Estimated upper bounds on Jensen's gap, which measure the tightness of the estimation of the log-likelihood of VAE, IWAE-5, and IWAE-10 models pre-trained on the MNIST (left), SVHN (middle), and CelebA (right) datasets (lower is better). All values shown are restricted to (and averaged over) all data points counted in Table \ref{['tab:my_label']}.
  • Figure 5: Histograms and Q-Q plots gnanadesikan1968probabilityfield2009shapiro for randomly selected images associated with each dataset and model combination. The horizontal dashed line is a boundary between positive outcomes, where our method provides superior upper bound estimates, and negative outcomes, where our results are suboptimal. Notably, when our method wins, the underlying distribution closely approximates a Gaussian distribution, as shown in the Q-Q plot. Conversely, when our bound estimates are worse, the distribution deviates significantly from a Gaussian. Note that for the MNIST dataset and the IWAE-5/10 models, our method outperforms the state-of-the-art for all images, indicating that no images fall on the right side of the dashed horizontal line (marked N/A).

Theorems & Definitions (7)

  • Theorem 1: Derived from pmlr-v206-struski23a
  • Theorem 2
  • Corollary 1
  • Remark 1
  • Theorem 3: Derived from pmlr-v206-struski23a
  • Theorem 4
  • Theorem 5