On Variational Bounds of Mutual Information
Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
TL;DR
This work unifies variational bounds on mutual information (MI) and analyzes why existing lower bounds degrade as MI grows. It introduces a continuum of bounds that trade bias and variance, extends to multi-sample and nonlinear interpolations, and leverages known conditional structures and density-ratio estimators to construct tractable estimators with provable bounds. The authors provide empirical bias-variance characterizations on synthetic high-dimensional problems and demonstrate decoder-free, MI-based representation learning on dSprites, highlighting practical gains in disentanglement under information constraints. The results offer a toolkit of tunable MI bounds that balance tractability and tightness, informing both MI estimation and representation learning in high-dimensional settings.
Abstract
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent developments in a single framework. We find that the existing variational lower bounds degrade when the MI is large, exhibiting either high bias or high variance. To address this problem, we introduce a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance. On high-dimensional, controlled problems, we empirically characterize the bias and variance of the bounds and their gradients and demonstrate the effectiveness of our new bounds for estimation and representation learning.
