Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Rob Brekelmans; Sicong Huang; Marzyeh Ghassemi; Greg Ver Steeg; Roger Grosse; Alireza Makhzani

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

TL;DR

This work tackles the core challenge of estimating mutual information $I(\mathbf X;\mathbf Z)$ when densities are intractable. It introduces a unifying importance-sampling framework that yields both lower and upper bounds via extended-state-space constructions, and develops novel bounds including Multi-Sample AIS, GIWAE, and MINE-AIS. GIWAE generalizes variational and contrastive bounds (InfoNCE, IWAE, Barber-Agakov), while MINE-AIS expands MI estimation to settings with unknown $p(\mathbf X|\mathbf Z)$ by training an energy-based posterior and evaluating with Multi-Sample AIS. The methods are demonstrated on VAEs and GANs trained on MNIST and CIFAR, showing significant gains in tightness and practicality for large MI regimes. Collectively, the paper provides practical tools and theoretical insight for MI estimation in deep generative modeling and related applications.

Abstract

Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of MI is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we present a unifying view of existing MI bounds from the perspective of importance sampling, and propose three novel bounds based on this approach. Since accurate estimation of MI without density information requires a sample size exponential in the true MI, we assume either a single marginal or the full joint density information is known. In settings where the full joint density is available, we propose Multi-Sample Annealed Importance Sampling (AIS) bounds on MI, which we demonstrate can tightly estimate large values of MI in our experiments. In settings where only a single marginal distribution is known, we propose Generalized IWAE (GIWAE) and MINE-AIS bounds. Our GIWAE bound unifies variational and contrastive bounds in a single framework that generalizes InfoNCE, IWAE, and Barber-Agakov bounds. Our MINE-AIS method improves upon existing energy-based methods such as MINE-DV and MINE-F by directly optimizing a tighter lower bound on MI. MINE-AIS uses MCMC sampling to estimate gradients for training and Multi-Sample AIS for evaluating the bound. Our methods are particularly suitable for evaluating MI in deep generative models, since explicit forms of the marginal or joint densities are often available. We evaluate our bounds on estimating the MI of VAEs and GANs trained on the MNIST and CIFAR datasets, and showcase significant gains over existing bounds in these challenging settings with high ground truth MI.

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

TL;DR

This work tackles the core challenge of estimating mutual information

when densities are intractable. It introduces a unifying importance-sampling framework that yields both lower and upper bounds via extended-state-space constructions, and develops novel bounds including Multi-Sample AIS, GIWAE, and MINE-AIS. GIWAE generalizes variational and contrastive bounds (InfoNCE, IWAE, Barber-Agakov), while MINE-AIS expands MI estimation to settings with unknown

by training an energy-based posterior and evaluating with Multi-Sample AIS. The methods are demonstrated on VAEs and GANs trained on MNIST and CIFAR, showing significant gains in tightness and practicality for large MI regimes. Collectively, the paper provides practical tools and theoretical insight for MI estimation in deep generative modeling and related applications.

Abstract

Paper Structure (124 sections, 32 theorems, 214 equations, 10 figures, 5 tables)

This paper contains 124 sections, 32 theorems, 214 equations, 10 figures, 5 tables.

Introduction
Problem Setting
Unifying Mutual Information Bounds via Importance Sampling
A General Approach for Extended State Space Importance Sampling Bounds
Barber-Agakov Lower and Upper Bounds
Importance Weighted Autoencoder Lower and Upper Bounds
Complexity in K
Relationship with Structured InfoNCE
Generalized IWAE
Relationship with BA
Relationship with InfoNCE
Relationship with IWAE
Relationship with Structured InfoNCE
Multi-Sample AIS Bounds for Estimating Mutual Information
Annealed Importance Sampling Background
...and 109 more sections

Key Result

Proposition 2.0

Let $p_{\textsc{tgt}}^{\textsc{iwae}}( {s} |\mathbf{x}, \mathbf{z}^{(1:K)})$$= \frac{p(\mathbf{x},\mathbf{z}^{({s})})}{q_{\theta}(\mathbf{z}^{({s})}|\mathbf{x})}/\sum_{k=1}^K \frac{p(\mathbf{x},\mathbf{z}^{(k)})}{q_{\theta}(\mathbf{z}^{(k)}|\mathbf{x})}$ denote the normalized importance weights and

Figures (10)

Figure 1: Schematic of various bounds discussed in this paper. Green shading indicates our contributions, while columns and gold labels indicate single- or multi-sample bounds. Blue arrows indicate special cases using the indicated proposal distribution. Several bounds with unknown $p(\mathbf{x}|\mathbf{z})$ use learned energy or critic functions, where the optimal critic function reflects the true $p(\mathbf{x}|\mathbf{z})$. Relationships based on critic functions are indicated by red arrows. Bounds with unknown $p(\mathbf{x}|\mathbf{z})$ provide only lower bounds on , while we obtain both upper and lower bounds with known $p(\mathbf{x}|\mathbf{z})$. All bounds require a single known marginal $p(\mathbf{z})$ for evaluation, apart from (Structured) Info-NCE.
Figure 1: Estimation on mnist and cifar-10 with (with varying number of samples $K$), and Multi-Sample (with varying number of intermediate distributions $T$). Bounds with a gap of less than 2 nats from the ground truth are in bold.
Figure 2: Extended state-space probabilistic interpretations of Multi-Sample bounds. Forward chains are colored in blue, and backward chains are colored in red. Note that s and s are obtained by taking the expectation of the log unnormalized importance weights $\log p_{\textsc{tgt}}^{}(\cdot)/q_{\textsc{prop}}^{}(\cdot)$ under either the proposal or target distribution, and can then be translated to bounds.
Figure 3: Comparing Multi-Sample sandwich bounds for varying number of ais distributions.
Figure 4: Estimating ibal using Multi-Sample for various methods of critic function training.
...and 5 more figures

Theorems & Definitions (52)

Proposition 2.0: Improvement of IWAE with Increasing $K$
Corollary 2.0
Proposition 2.0: Improvement of IWAE over GIWAE
Corollary 2.0: Optimal GIWAE Critic Function yields IWAE
Corollary 2.0: Logarithmic Improvement of GIWAE
Proposition 3.0: Complexity in $T$
Proposition 4.0
Proposition 4.0: IBAL as Limiting Behavior of GIWAE
Proposition 4.0
Proposition B.0
...and 42 more

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

TL;DR

Abstract

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (52)