Mutual Information Multinomial Estimation
Yanzhi Chen, Zijing Ou, Adrian Weller, Yingzhen Li
TL;DR
This work tackles the challenge of estimating mutual information in high-dimensional, high-MI settings. It introduces Mutual Information Multinomial Estimation (MIME), which uses a multinomial classifier across four distributions, including a marginal-preserving vector Gaussian copula reference, to stabilize MI estimation and reduce overfitting. The authors prove consistency and controlled error bounds, and demonstrate MIME's superior robustness and scalability across synthetic benchmarks, Bayesian experimental design, and self-supervised learning scenarios, outperforming several baselines. The approach offers a practical pathway to reliable MI estimation in complex data regimes and highlights nuanced interactions between MI values and representation quality in SSL.
Abstract
Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this bridge distribution we can easily obtain the true difference between the joint distributions and the marginal distributions. Experiments on diverse tasks including non-Gaussian synthetic problems with known ground-truth and real-world applications demonstrate the advantages of our method.
