Table of Contents
Fetching ...

Mutual Information Estimation via Normalizing Flows

Ivan Butakov, Alexander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov

TL;DR

This work tackles high-dimensional mutual information estimation by leveraging normalizing flows to map (X,Y) into latent representations where MI is more tractable. By comparing a general MIENF framework with Gaussian-base confinements, it derives closed-form MI expressions and non-asymptotic bounds, including a scalable, parameter-efficient tri-diagonal Gaussian variant. The proposed approach yields consistent MI estimates under suitable conditions and demonstrates competitive performance against established MI estimators on synthetic, high-dimensional data. The method offers a practical, low-variance MI estimator with theoretical guarantees, extendable to a broader class of base distributions and injective generative models for complex data.

Abstract

We propose a novel approach to the problem of mutual information (MI) estimation via introducing a family of estimators based on normalizing flows. The estimator maps original data to the target distribution, for which MI is easier to estimate. We additionally explore the target distributions with known closed-form expressions for MI. Theoretical guarantees are provided to demonstrate that our approach yields MI estimates for the original data. Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method.

Mutual Information Estimation via Normalizing Flows

TL;DR

This work tackles high-dimensional mutual information estimation by leveraging normalizing flows to map (X,Y) into latent representations where MI is more tractable. By comparing a general MIENF framework with Gaussian-base confinements, it derives closed-form MI expressions and non-asymptotic bounds, including a scalable, parameter-efficient tri-diagonal Gaussian variant. The proposed approach yields consistent MI estimates under suitable conditions and demonstrates competitive performance against established MI estimators on synthetic, high-dimensional data. The method offers a practical, low-variance MI estimator with theoretical guarantees, extendable to a broader class of base distributions and injective generative models for complex data.

Abstract

We propose a novel approach to the problem of mutual information (MI) estimation via introducing a family of estimators based on normalizing flows. The estimator maps original data to the target distribution, for which MI is easier to estimate. We additionally explore the target distributions with known closed-form expressions for MI. Theoretical guarantees are provided to demonstrate that our approach yields MI estimates for the original data. Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method.
Paper Structure (17 sections, 1 theorem, 61 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 1 theorem, 61 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Lemma B.1

Consider independent $X \sim \textnormal{U}[0;1]$, $Z \sim \textnormal{U}[-\varepsilon; \varepsilon]$ and $Y = X + Z$. Then

Figures (2)

  • Figure 1: Tests with incompressible multidimensional data. "Uniform" denotes the uniformly distributed samples acquired from the correlated Gaussians via the Gaussian CDF. "Smoothed uniform" and "Student" denote the non-Gaussian-based distributions described in \ref{['appendix:non_gaussian_based_tests']}. "arcsinh(Student)" denotes the $\mathop{\mathrm{arcsinh}}\nolimits$ function applied to the "Student" example (this is done to avoid numerical instabilities in the case of long-tailed distributions). We run each test $5$ times and plot $99.9 \%$ asymptotic Gaussian CIs. $10 \cdot 10^3$ samples were used. Note that $\mathcal{N}$-MIENF and tridiag-$\mathcal{N}$-MIENF yield almost the same results with similar bias.
  • Figure 2: Comparison of the selected estimators. Along $x$ axes is $I(X;Y)$, along $y$ axes is $\hat{I}(X;Y)$. We plot 99.9% asymptotic confidence intervals acquired either from the MC integration standard deviation (WKL, KSG) or from the epochwise averaging (other methods, $200$ last epochs). $10 \cdot 10^3$ samples were used.

Theorems & Definitions (7)

  • Remark 4.1
  • Remark 4.2
  • Definition 4.3
  • Definition 4.4
  • Definition 4.5
  • Lemma B.1
  • proof