A Neural Difference-of-Entropies Estimator for Mutual Information

Haoran Ni; Martin Lotz

A Neural Difference-of-Entropies Estimator for Mutual Information

Haoran Ni, Martin Lotz

TL;DR

This work tackles the challenging problem of estimating mutual information in high dimensions without strong modelling assumptions. It introduces a difference-of-entropies (DoE) estimator implemented with block autoregressive normalizing flows that jointly model $H(X)$ and $H(X|Y)$, enabling unbiased, consistent MI estimation via $I(X;Y)=H(X)-H(X|Y)$. Theoretical results establish the existence and properties of block-triangular normalizing flows to represent joint densities and conditional densities, while empirical evaluations show robust performance across Gaussian, nonlinear, and heavy-tailed distributions, often surpassing state-of-the-art discriminative and generative baselines. The approach offers a scalable, principled MI estimator with potential impact on ML tasks requiring dependence measures and information-theoretic objectives, though it requires careful architectural choices for stability and may be extended to discrete settings and downstream applications.

Abstract

Estimating Mutual Information (MI), a key measure of dependence of random quantities without specific modelling assumptions, is a challenging problem in high dimensions. We propose a novel mutual information estimator based on parametrizing conditional densities using normalizing flows, a deep generative model that has gained popularity in recent years. This estimator leverages a block autoregressive structure to achieve improved bias-variance trade-offs on standard benchmark tasks.

A Neural Difference-of-Entropies Estimator for Mutual Information

TL;DR

and

, enabling unbiased, consistent MI estimation via

. Theoretical results establish the existence and properties of block-triangular normalizing flows to represent joint densities and conditional densities, while empirical evaluations show robust performance across Gaussian, nonlinear, and heavy-tailed distributions, often surpassing state-of-the-art discriminative and generative baselines. The approach offers a scalable, principled MI estimator with potential impact on ML tasks requiring dependence measures and information-theoretic objectives, though it requires careful architectural choices for stability and may be extended to discrete settings and downstream applications.

A Neural Difference-of-Entropies Estimator for Mutual Information

TL;DR

Abstract

A Neural Difference-of-Entropies Estimator for Mutual Information

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (8)