Table of Contents
Fetching ...

InfoBridge: Mutual Information estimation via Bridge Matching

Sergei Kholkin, Ivan Butakov, Evgeny Burnaev, Nikita Gushchin, Alexander Korotin

TL;DR

This work constructs an unbiased estimator for MI estimation as a domain transfer problem, framing MI estimation as a domain transfer problem, and showcases the performance of the estimator on three standard MI estimation benchmarks.

Abstract

Diffusion bridge models have recently become a powerful tool in the field of generative modeling. In this work, we leverage their power to address another important problem in machine learning and information theory, the estimation of the mutual information (MI) between two random variables. Neatly framing MI estimation as a domain transfer problem, we construct an unbiased estimator for data posing difficulties for conventional MI estimators. We showcase the performance of our estimator on three standard MI estimation benchmarks, i.e., low-dimensional, image-based and high MI, and on real-world data, i.e., protein language model embeddings.

InfoBridge: Mutual Information estimation via Bridge Matching

TL;DR

This work constructs an unbiased estimator for MI estimation as a domain transfer problem, framing MI estimation as a domain transfer problem, and showcases the performance of the estimator on three standard MI estimation benchmarks.

Abstract

Diffusion bridge models have recently become a powerful tool in the field of generative modeling. In this work, we leverage their power to address another important problem in machine learning and information theory, the estimation of the mutual information (MI) between two random variables. Neatly framing MI estimation as a domain transfer problem, we construct an unbiased estimator for data posing difficulties for conventional MI estimators. We showcase the performance of our estimator on three standard MI estimation benchmarks, i.e., low-dimensional, image-based and high MI, and on real-world data, i.e., protein language model embeddings.

Paper Structure

This paper contains 69 sections, 8 theorems, 59 equations, 7 figures, 20 tables, 2 algorithms.

Key Result

Theorem 4.1

Consider random variables $X_0, X_1$ and their joint distribution $\pi(x_0, x_1)$, such that $I(X_0;X_1)<\infty$ and $\int_{\mathbb{R}^D}\|x_1\|\,d\pi(x_1)<\infty$. Consider reciprocal processes $Q_{\pi}$, $Q_\pi^{\text{ind}}$ induced by distributions $\pi(x_0, x_1)$ and $\pi(x_0)\pi(x_1)$, respecti where $v_{\text{joint}}$ and $v_{\text{ind}}$ are the drifts of the SDE representations eq:recipro

Figures (7)

  • Figure 1: Comparison of the MI estimators. Along $x$ axes is $I(X_0;X_1)$, along $y$ axes is MI estimate $\hat{I}(X_0;X_1)$. We plot 99% confidence intervals acquired from different seed runs.
  • Figure 2: Comparison of the selected estimators on the ProtTrans5 data. Along $x$ axes is ground truth $I(X_0, X_1)$, along $y$ axes is MI estimate $\hat{I}(X_0, X_1)$.
  • Figure 3: Comparison of MI estimates across dimensions and MI for high mutual information.
  • Figure 4: Comparison of our InfoBridge combined with different methods \ref{['corollary:Gaussian_maximizes_entropy']} (Gaussian) and \ref{['corollary:uniform_maximizes_entropy']} (Uniform) and with different reference distributions $p(x_0)$ on the entropy estimation task. Along $x$ axes is the ground truth entropy $H(X)$, along $y$ axes is the entropy estimate $\hat{H}(X)$. Each configuration is ran once.
  • Figure 5: Examples of synthetic images from the butakov2024normflows benchmark can be seen in \ref{['fig:image_samples_a', 'fig:image_samples_b']}. Note that images are high-dimensional, but admit latent structure, which is similar to real datasets. Samples generated by our InfoBridge from the learned distributions $\pi_{\theta}(x_1|x_0)\approx \pi(x_1|x_0)$ and $\pi^\text{ind}_{\theta}(x_1|x_0)\approx\pi(x_1)$ defined as solutions to SDEs \ref{['eq:reciprocal_non_markovian_sde']} with approximated drifts $v_\theta(\cdot, 0)$ and $v_\theta(\cdot, 1)$, respectively, can be seen in \ref{['fig:image_samples_c', 'fig:image_samples_d', 'fig:image_samples_e', 'fig:image_samples_f']}. All the images have 32$\times$32 resolution.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Theorem 4.1: Mutual Information decomposition
  • Lemma A.2
  • proof
  • Proposition A.3
  • proof
  • Lemma A.4
  • proof
  • proof
  • Theorem B.1: KL divergence decomposition
  • proof
  • ...and 3 more