Table of Contents
Fetching ...

A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation

Forough Fazeliasl, Michael Minyi Zhang, Bei Jiang, Linglong Kong

TL;DR

This paper introduces a Dirichlet process–based Bayesian nonparametric framework (DPMINE) for robust mutual information estimation in high‑dimensional settings. It constructs DP posterior‑based lower bounds for MI, deriving KL‑based (DV) and JS‑based (JS) variants that offer tighter, more stable estimates and proven consistency. By embedding DPMINE into BNPWMMD‑GAN, the authors regularize deep generative models, improving convergence and reducing mode collapse in 3D image synthesis tasks, including COVID‑19 chest CT and BraTS brain MRI data. The work highlights broader applicability of BNP MI estimators beyond generative modeling and outlines future directions for large language models and federated learning, while noting limitations such as the IID assumption and the need for careful bias control in real‑world deployments.

Abstract

Mutual Information (MI) is a crucial measure for capturing dependencies between variables, but exact computation is challenging in high dimensions with intractable likelihoods, impacting accuracy and robustness. One idea is to use an auxiliary neural network to train an MI estimator; however, methods based on the empirical distribution function (EDF) can introduce sharp fluctuations in the MI loss due to poor out-of-sample performance, destabilizing convergence. We present a Bayesian nonparametric (BNP) solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization in the training process. With this regularization, the MI loss integrates both prior knowledge and empirical data to reduce the loss sensitivity to fluctuations and outliers in the sample data, especially in small sample settings like mini-batches. This approach addresses the challenge of balancing accuracy and low variance by effectively reducing variance, leading to stabilized and robust MI loss gradients during training and enhancing the convergence of the MI approximation while offering stronger theoretical guarantees for convergence. We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder. Experimental results demonstrate significant improvements in convergence over EDF-based methods, with applications across synthetic and real datasets, notably in 3D CT image generation, yielding enhanced structure discovery and reduced overfitting in data synthesis. While this paper focuses on generative models in application, the proposed estimator is not restricted to this setting and can be applied more broadly in various BNP learning procedures.

A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation

TL;DR

This paper introduces a Dirichlet process–based Bayesian nonparametric framework (DPMINE) for robust mutual information estimation in high‑dimensional settings. It constructs DP posterior‑based lower bounds for MI, deriving KL‑based (DV) and JS‑based (JS) variants that offer tighter, more stable estimates and proven consistency. By embedding DPMINE into BNPWMMD‑GAN, the authors regularize deep generative models, improving convergence and reducing mode collapse in 3D image synthesis tasks, including COVID‑19 chest CT and BraTS brain MRI data. The work highlights broader applicability of BNP MI estimators beyond generative modeling and outlines future directions for large language models and federated learning, while noting limitations such as the IID assumption and the need for careful bias control in real‑world deployments.

Abstract

Mutual Information (MI) is a crucial measure for capturing dependencies between variables, but exact computation is challenging in high dimensions with intractable likelihoods, impacting accuracy and robustness. One idea is to use an auxiliary neural network to train an MI estimator; however, methods based on the empirical distribution function (EDF) can introduce sharp fluctuations in the MI loss due to poor out-of-sample performance, destabilizing convergence. We present a Bayesian nonparametric (BNP) solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization in the training process. With this regularization, the MI loss integrates both prior knowledge and empirical data to reduce the loss sensitivity to fluctuations and outliers in the sample data, especially in small sample settings like mini-batches. This approach addresses the challenge of balancing accuracy and low variance by effectively reducing variance, leading to stabilized and robust MI loss gradients during training and enhancing the convergence of the MI approximation while offering stronger theoretical guarantees for convergence. We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder. Experimental results demonstrate significant improvements in convergence over EDF-based methods, with applications across synthetic and real datasets, notably in 3D CT image generation, yielding enhanced structure discovery and reduced overfitting in data synthesis. While this paper focuses on generative models in application, the proposed estimator is not restricted to this setting and can be applied more broadly in various BNP learning procedures.

Paper Structure

This paper contains 41 sections, 4 theorems, 29 equations, 15 figures, 10 tables.

Key Result

Theorem 4

Considering DP posterior representations defined in DPDV-lower and DPJS. Given the DP posterior approximation in approx of DP, we have, where "a.s." stands for "almost surely", denoting that the statements hold with probability 1.

Figures (15)

  • Figure 1: A general diagram of the BNPWMMD model refined by DPMINE in generating 3D images.
  • Figure 2: MINE estimation of the MI between two random variables $X$ and $Y$ using both BNP and FNP frameworks, given a sample size of 16 over 500 epochs. The red dashed line represents the true value of MI. The blue line represents the BNP estimation of MI (our method), while the yellow line represents the FNP estimation. The left-hand figure in each experiment represents the JS-based estimator, and the right-hand figure represents the KL-based estimator.
  • Figure 3: MI estimations between two random variables $\mathbf{X},\mathbf{Y}\overset{\text{IID}}{\sim}U(-\mathbf{1},\mathbf{1})$, $\mathbf{X},\mathbf{Y}\in\mathbb{R}^d$ for various dimension $d$.
  • Figure 4: MI estimations between two random variables $\mathbf{X}=\text{sign}(\mathbf{Z}),\,(\mathbf{Z}\in\mathbb{R}^d)\sim N(\mathbf{0},I_d)$ and $\mathbf{Y}=\mathbf{X}+\boldsymbol{\epsilon},\, (\boldsymbol{\epsilon}\in\mathbb{R}^d)\sim N(\mathbf{0},0.2I_d)$ for various dimension $d$.
  • Figure 5: 1000 randomly generated and reconstructed samples for the coil example.
  • ...and 10 more figures

Theorems & Definitions (7)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 4: Limiting expectation
  • Theorem 5: Consistency
  • Theorem 6: Limiting expectation
  • Theorem 7: Consistency