Table of Contents
Fetching ...

On the Convergence of the ELBO to Entropy Sums

Jörg Lücke, Jan Warnken

TL;DR

This work proves that for a broad class of exponential-family generative models with a suitable parameterization criterion, the ELBO evaluated at any stationary point equals a sum of entropies: the average latent-approximation entropy minus the prior entropy and the expected entropy of the observable distribution. The results are established first for exponential-family models with constant base measures and then generalized to arbitrary EF models using pseudo entropies and new measures, thereby extending entropy-sum convergence beyond Gaussian settings. The theoretical contributions unify and generalize prior Gaussian-focused insights, enabling entropy-based analyses and potential entropy-driven learning objectives across diverse models, including VAEs, SBNs, FA, GMMs, and Poisson mixtures. The findings offer a deeper information-theoretic perspective on ELBO optimization and open avenues for future work on learning objectives, optimization landscapes, and extensions to deeper or undirected generative architectures.

Abstract

The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as for many novel algorithms for unsupervised learning. Such algorithms usually increase the bound until parameters have converged to values close to a stationary point of the learning dynamics. Here we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. Concretely, for standard generative models with one set of latents and one set of observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distribution. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary point (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many standard as well as novel generative models including standard (Gaussian) variational autoencoders. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions defining a given generative model have to be of the exponential family, and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.

On the Convergence of the ELBO to Entropy Sums

TL;DR

This work proves that for a broad class of exponential-family generative models with a suitable parameterization criterion, the ELBO evaluated at any stationary point equals a sum of entropies: the average latent-approximation entropy minus the prior entropy and the expected entropy of the observable distribution. The results are established first for exponential-family models with constant base measures and then generalized to arbitrary EF models using pseudo entropies and new measures, thereby extending entropy-sum convergence beyond Gaussian settings. The theoretical contributions unify and generalize prior Gaussian-focused insights, enabling entropy-based analyses and potential entropy-driven learning objectives across diverse models, including VAEs, SBNs, FA, GMMs, and Poisson mixtures. The findings offer a deeper information-theoretic perspective on ELBO optimization and open avenues for future work on learning objectives, optimization landscapes, and extensions to deeper or undirected generative architectures.

Abstract

The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as for many novel algorithms for unsupervised learning. Such algorithms usually increase the bound until parameters have converged to values close to a stationary point of the learning dynamics. Here we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. Concretely, for standard generative models with one set of latents and one set of observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distribution. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary point (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many standard as well as novel generative models including standard (Gaussian) variational autoencoders. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions defining a given generative model have to be of the exponential family, and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
Paper Structure (11 sections, 11 theorems, 114 equations)

This paper contains 11 sections, 11 theorems, 114 equations.

Key Result

Lemma 1

Consider an EF generative model as given by Definition def:EF_Gen_Model, and let the dimensionalities of the natural parameter vectors $\vec{\zeta}$ and $\vec{\eta}$ be $K$ and $L$, respectively. Let further $\frac{\partial{}\vec{\zeta}^{\mathrm{\,T}}(\vec{\Psi})}{\partial{}\vec{\Psi}}$ and $\frac{\ In the case when $\vec{z}$ is a discrete latent variable, the integrals in (EqnLemmaParamCritB1) be

Theorems & Definitions (32)

  • Definition A: Generative Model
  • Definition B: EF Generative Models
  • Definition C: Parameterization Criterion
  • Example 1: Simple SBN
  • Example 2: Simple Factor Analysis
  • Example 3: Counter-Example: Rigid SBN
  • Lemma 1
  • proof
  • Theorem 1: Equality to Entropy Sums
  • proof
  • ...and 22 more