Table of Contents
Fetching ...

MINDE: Mutual Information Neural Diffusion Estimation

Giulio Franzese, Mustapha Bounoua, Pietro Michiardi

TL;DR

MINDE introduces a diffusion-based framework for mutual information estimation by expressing KL divergence as a difference of score functions via a Girsanov-inspired view. It provides two modeling directions—conditional diffusion and joint diffusion—and a family of estimators that combine score networks with entropy calculations, achieving strong accuracy on challenging distributions and passing self-consistency tests. The method demonstrates robustness across high-dimensional, transformed, and sparse-interaction scenarios and offers potential for scaling to multi-modal and multi-variable MI analysis. Overall, MINDE expands the information-theoretic toolkit for neural estimators by leveraging score-based diffusion, enabling precise MI and entropy estimates in complex data settings.

Abstract

In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.

MINDE: Mutual Information Neural Diffusion Estimation

TL;DR

MINDE introduces a diffusion-based framework for mutual information estimation by expressing KL divergence as a difference of score functions via a Girsanov-inspired view. It provides two modeling directions—conditional diffusion and joint diffusion—and a family of estimators that combine score networks with entropy calculations, achieving strong accuracy on challenging distributions and passing self-consistency tests. The method demonstrates robustness across high-dimensional, transformed, and sparse-interaction scenarios and offers potential for scaling to multi-modal and multi-variable MI analysis. Overall, MINDE expands the information-theoretic toolkit for neural estimators by leveraging score-based diffusion, enabling precise MI and entropy estimates in complex data settings.

Abstract

In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.
Paper Structure (32 sections, 32 equations, 11 figures, 5 tables, 4 algorithms)

This paper contains 32 sections, 32 equations, 11 figures, 5 tables, 4 algorithms.

Figures (11)

  • Figure 1: High benchmark: original (column (a)) and transformed variants (columns (b) and (c)).
  • Figure 2: Consistency tests results on the mnist dataset. Baseline test \ref{['fig:set_1']}: Evaluation of $\frac{I(A,B_r)}{I(A,B_0)}$. $A$ is an image and $B_r$ is an image containing the top $t$ rows of $A$. Data processing test \ref{['fig:set_2']}: Evaluation of $\frac{I(A,[B_{r+k},B_{r})] )}{I(A,B_{r+k})}$ (ideal value is 1). Additivity test \ref{['fig:set_3']}: Evaluation of $\frac{I( [A^1, A^2],[B^1_r, B^2_r] )}{I(A^1,B^1_r)}$ (ideal value is 2).
  • Figure 3: We report MI estimate results over 10 seeds for N =10000 for our method and competitors for training size 100k sample. A method absent from the depiction implies either non convergence during training or results out of scale
  • Figure 4: We report MI estimate results over 10 seeds for N =10000 for our method and competitors for training size 100k sample.
  • Figure 5: Training Size ablation study : We report MI estimate results for our method and competitors as a function of the training size used (5k,10k,50k,100k). For readability, we discard the baselines with estimation (error $>$ 2 * GT) or high standard deviation. All results are averaged over 5 seeds. Due the benchmark size, we split the results into 4 figures each containing 10 benchmarks. A method absent from the depiction implies either non convergence during training or results out of scale. In this first plot we report tasks 1-10.
  • ...and 6 more figures