On the Properties and Estimation of Pointwise Mutual Information Profiles

Paweł Czyż; Frederic Grabowski; Julia E. Vogt; Niko Beerenwinkel; Alexander Marx

On the Properties and Estimation of Pointwise Mutual Information Profiles

Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx

TL;DR

This work introduces the pointwise mutual information (PMI) profile, the distribution of PMI$(X,Y)$ whose mean equals the mutual information $\mathbf{I}(X;Y)$, and proves its invariance to reparametrizations. It derives an analytic PMI profile for multivariate normal distributions and, to overcome limitations of existing benchmarks, defines Bend and Mix Models (BMMs) that concatenate bending via diffeomorphisms with mixing via mixtures to enable unbiased Monte Carlo estimation of both the PMI profile and $\mathbf{I}(X;Y)$. The authors demonstrate BMMs as effective tools for constructing expressive benchmarks, analyzing estimator robustness to inliers and outliers, and evaluating neural critics in variational MI estimators, while also enabling model-based Bayesian MI estimation with uncertainty quantification. The framework supports principled uncertainty-aware MI inference in problems with domain knowledge and provides actionable guidance for benchmarking, estimator selection, and reliability in MI estimation.

Abstract

The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables. One of its important properties is that its expected value is precisely the mutual information between these random variables. In this paper, we analytically describe the profiles of multivariate normal distributions and introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods. We then show how Bend and Mix Models can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how Bend and Mix Models can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.

On the Properties and Estimation of Pointwise Mutual Information Profiles

TL;DR

This work introduces the pointwise mutual information (PMI) profile, the distribution of PMI

whose mean equals the mutual information

, and proves its invariance to reparametrizations. It derives an analytic PMI profile for multivariate normal distributions and, to overcome limitations of existing benchmarks, defines Bend and Mix Models (BMMs) that concatenate bending via diffeomorphisms with mixing via mixtures to enable unbiased Monte Carlo estimation of both the PMI profile and

. The authors demonstrate BMMs as effective tools for constructing expressive benchmarks, analyzing estimator robustness to inliers and outliers, and evaluating neural critics in variational MI estimators, while also enabling model-based Bayesian MI estimation with uncertainty quantification. The framework supports principled uncertainty-aware MI inference in problems with domain knowledge and provides actionable guidance for benchmarking, estimator selection, and reliability in MI estimation.

Abstract

Paper Structure (47 sections, 23 theorems, 95 equations, 13 figures, 2 tables)

This paper contains 47 sections, 23 theorems, 95 equations, 13 figures, 2 tables.

Introduction
Theoretical framework
Pointwise mutual information profiles
Bend and Mix Models
Case studies
Novel distributions for estimator evaluation
Modeling inliers and outliers
Variational estimators and the PMI profile
Model-based mutual information estimation
Conclusion
Limitations and further research
Reproducibility
Acknowledgments
Technical results
Proof of the invariance of the pointwise mutual information profile
...and 32 more sections

Key Result

theorem 3

Let $P_{XY}\in \mathcal{P}\!\left({\mathcal{X},\mathcal{Y}}\right)$ and $f\colon \mathcal{X}\to\mathcal{X}$ and $g\colon \mathcal{Y}\to \mathcal{Y}$ be diffeomorphisms. Then for $X'=f(X)$ and $Y'=g(Y)$ it holds that $P_{X'Y'}\in \mathcal{P}\!\left({\mathcal{X},\mathcal{Y}}\right)$ and $\mathrm{Prof}

Figures (13)

Figure 1: First two panels: samples from a bivariate normal distribution and the same distribution with marginals transformed. Both distributions have the same PMI profile (blue histogram in the fourth panel). Third panel: mixture distribution, which cannot be obtained as a transformation of the normal distribution due to a distinct PMI profile (green histogram in the fourth panel). All three distributions have the same mutual information, marked with the black line in panel four.
Figure 2: Samples from the example distributions. Distributions X and AI represent one-dimensional variables $X$ and $Y$. Distributions Waves and Galaxy plot two-dimensional $X$ variable using spatial coordinates, while one-dimensional $Y$ variable is represented by color. The rightmost plot presents estimates according to different mutual information algorithms using independently generated data sets with $N=5\,000$ points each, compared to the ground-truth MI of the distribution (dotted line).
Figure 3: Left: increasing the contamination level $\alpha$ with inlier noise distribution. Middle: increasing the contamination level $\alpha$ with outlier noise distribution. Right: increasing the variance of the noisy normal distribution for constant contamination of $20\%$. Outliers have less impact than inliers.
Figure 4: Left: PDF of the considered distribution. Middle: neural critic and PMI values. Right: normalized neural critic and PMI profiles.
Figure 5: Estimation of mutual information using a function approximating PMI as a function of sample size for Monte Carlo (MC), InfoNCE, Donsker-Varadhan and NWJ losses. From left to right: true PMI function from Fig. \ref{['fig:critics-pmi-plotted']} is used, a constant bias is added, a functional bias is added. The rightmost plot: true PMI function for a different, high-dimensional problem is used.
...and 8 more figures

Theorems & Definitions (45)

definition 1
definition 2
theorem 3
theorem 4
proposition 4
proposition 4
definition 5
proposition 5
proposition 5
proposition 5
...and 35 more

On the Properties and Estimation of Pointwise Mutual Information Profiles

TL;DR

Abstract

On the Properties and Estimation of Pointwise Mutual Information Profiles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (45)