Mutual Information Estimation via Normalizing Flows

Ivan Butakov; Alexander Tolmachev; Sofia Malanchuk; Anna Neopryatnaya; Alexey Frolov

Mutual Information Estimation via Normalizing Flows

Ivan Butakov, Alexander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov

TL;DR

This work tackles high-dimensional mutual information estimation by leveraging normalizing flows to map (X,Y) into latent representations where MI is more tractable. By comparing a general MIENF framework with Gaussian-base confinements, it derives closed-form MI expressions and non-asymptotic bounds, including a scalable, parameter-efficient tri-diagonal Gaussian variant. The proposed approach yields consistent MI estimates under suitable conditions and demonstrates competitive performance against established MI estimators on synthetic, high-dimensional data. The method offers a practical, low-variance MI estimator with theoretical guarantees, extendable to a broader class of base distributions and injective generative models for complex data.

Abstract

We propose a novel approach to the problem of mutual information (MI) estimation via introducing a family of estimators based on normalizing flows. The estimator maps original data to the target distribution, for which MI is easier to estimate. We additionally explore the target distributions with known closed-form expressions for MI. Theoretical guarantees are provided to demonstrate that our approach yields MI estimates for the original data. Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method.

Mutual Information Estimation via Normalizing Flows

TL;DR

Abstract

Paper Structure (17 sections, 1 theorem, 61 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 1 theorem, 61 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
General method
Using Gaussian base distribution
General binormalization approach
Refined approach
Tractable error bounds
Implementation details
Extension to non-Gaussian base distributions and non-bijective flows
Experiments
Discussion
Limitations
Complete proofs
Non-Gaussian-based tests
Multivariate Student distribution
...and 2 more sections

Key Result

Lemma B.1

Consider independent $X \sim \textnormal{U}[0;1]$, $Z \sim \textnormal{U}[-\varepsilon; \varepsilon]$ and $Y = X + Z$. Then

Figures (2)

Figure 1: Tests with incompressible multidimensional data. "Uniform" denotes the uniformly distributed samples acquired from the correlated Gaussians via the Gaussian CDF. "Smoothed uniform" and "Student" denote the non-Gaussian-based distributions described in \ref{['appendix:non_gaussian_based_tests']}. "arcsinh(Student)" denotes the $\mathop{\mathrm{arcsinh}}\nolimits$ function applied to the "Student" example (this is done to avoid numerical instabilities in the case of long-tailed distributions). We run each test $5$ times and plot $99.9 \%$ asymptotic Gaussian CIs. $10 \cdot 10^3$ samples were used. Note that $\mathcal{N}$-MIENF and tridiag-$\mathcal{N}$-MIENF yield almost the same results with similar bias.
Figure 2: Comparison of the selected estimators. Along $x$ axes is $I(X;Y)$, along $y$ axes is $\hat{I}(X;Y)$. We plot 99.9% asymptotic confidence intervals acquired either from the MC integration standard deviation (WKL, KSG) or from the epochwise averaging (other methods, $200$ last epochs). $10 \cdot 10^3$ samples were used.

Theorems & Definitions (7)

Remark 4.1
Remark 4.2
Definition 4.3
Definition 4.4
Definition 4.5
Lemma B.1
proof

Mutual Information Estimation via Normalizing Flows

TL;DR

Abstract

Mutual Information Estimation via Normalizing Flows

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)