Table of Contents
Fetching ...

Discrete Bridges for Mutual Information Estimation

Iryna Zabarianska, Sergei Kholkin, Grigoriy Ksenofontov, Ivan Butakov, Alexander Korotin

TL;DR

The paper tackles discrete mutual information estimation by reframing it as a domain-transfer problem solvable with discrete bridge matching. It introduces DBMI, which computes MI as a KL divergence between reciprocal processes represented as conditional Markov transitions learned via a bridge-matching objective, then estimates MI from these learned transitions. The approach is validated on a low-dimensional synthetic benchmark and a novel high-dimensional image-based discrete benchmark, where DBMI outperforms neural estimators designed for discrete data (e.g., MINE, InfoNCE, NWJ, f-DIME). The simulation-free training and scalable transition-factorization enable accurate MI estimation in complex discrete spaces, with potential impact on information-theoretic analyses and applications involving discrete data.

Abstract

Diffusion bridge models in both continuous and discrete state spaces have recently become powerful tools in the field of generative modeling. In this work, we leverage the discrete state space formulation of bridge matching models to address another important problem in machine learning and information theory: the estimation of the mutual information (MI) between discrete random variables. By neatly framing MI estimation as a domain transfer problem, we construct a Discrete Bridge Mutual Information (DBMI) estimator suitable for discrete data, which poses difficulties for conventional MI estimators. We showcase the performance of our estimator on two MI estimation settings: low-dimensional and image-based.

Discrete Bridges for Mutual Information Estimation

TL;DR

The paper tackles discrete mutual information estimation by reframing it as a domain-transfer problem solvable with discrete bridge matching. It introduces DBMI, which computes MI as a KL divergence between reciprocal processes represented as conditional Markov transitions learned via a bridge-matching objective, then estimates MI from these learned transitions. The approach is validated on a low-dimensional synthetic benchmark and a novel high-dimensional image-based discrete benchmark, where DBMI outperforms neural estimators designed for discrete data (e.g., MINE, InfoNCE, NWJ, f-DIME). The simulation-free training and scalable transition-factorization enable accurate MI estimation in complex discrete spaces, with potential impact on information-theoretic analyses and applications involving discrete data.

Abstract

Diffusion bridge models in both continuous and discrete state spaces have recently become powerful tools in the field of generative modeling. In this work, we leverage the discrete state space formulation of bridge matching models to address another important problem in machine learning and information theory: the estimation of the mutual information (MI) between discrete random variables. By neatly framing MI estimation as a domain transfer problem, we construct a Discrete Bridge Mutual Information (DBMI) estimator suitable for discrete data, which poses difficulties for conventional MI estimators. We showcase the performance of our estimator on two MI estimation settings: low-dimensional and image-based.
Paper Structure (48 sections, 2 theorems, 27 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 48 sections, 2 theorems, 27 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Consider the reciprocal process conditioned on point $x_0$, $r_{\pi|x_0}(x_{\rm in}, x_1)$. Then $r_{\pi|x_0}(x_{\rm in}, x_1)$ is Markov:

Figures (4)

  • Figure 1: Qualitative samples, presented in $5\times2$ image grids, generated using image benchmark and DBMI, i.e., $r_{\theta}(\cdot)$, at resolution $32\times32$.
  • Figure 2: Comparison of estimated mutual information $\hat{\mathop{\mathrm{\mathsf{I}}}\nolimits}(X_0; X_1)$ across methods against the ground-truth $\mathop{\mathrm{\mathsf{I}}}\nolimits(X_0; X_1)$ (red) on a high-dimensional image benchmark with size $32\times32$.
  • Figure 3: Qualitative samples, presented in $5\times2$ image grids, generated using image benchmark and DBMI, i.e., $r_{\theta}(\cdot)$, at resolution $16\times16$.
  • Figure 4: Comparison of estimated mutual information $\hat{\mathop{\mathrm{\mathsf{I}}}\nolimits}(X_0; X_1)$ across methods against the ground-truth $\mathop{\mathrm{\mathsf{I}}}\nolimits(X_0; X_1)$ (red) on a high-dimensional image benchmark with size $16\times16$.

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2: Mutual Information Decomposition
  • proof
  • proof