A Benchmark Suite for Evaluating Neural Mutual Information Estimators on Unstructured Datasets

Kyungeun Lee; Wonjong Rhee

A Benchmark Suite for Evaluating Neural Mutual Information Estimators on Unstructured Datasets

Kyungeun Lee, Wonjong Rhee

TL;DR

This study introduces a comprehensive benchmark suite for evaluating neural MI estimators on unstructured datasets, specifically focusing on images and texts, and shows that it can accurately manipulate true MI values of real-world datasets.

Abstract

Mutual Information (MI) is a fundamental metric for quantifying dependency between two random variables. When we can access only the samples, but not the underlying distribution functions, we can evaluate MI using sample-based estimators. Assessment of such MI estimators, however, has almost always relied on analytical datasets including Gaussian multivariates. Such datasets allow analytical calculations of the true MI values, but they are limited in that they do not reflect the complexities of real-world datasets. This study introduces a comprehensive benchmark suite for evaluating neural MI estimators on unstructured datasets, specifically focusing on images and texts. By leveraging same-class sampling for positive pairing and introducing a binary symmetric channel trick, we show that we can accurately manipulate true MI values of real-world datasets. Using the benchmark suite, we investigate seven challenging scenarios, shedding light on the reliability of neural MI estimators for unstructured datasets.

A Benchmark Suite for Evaluating Neural Mutual Information Estimators on Unstructured Datasets

TL;DR

Abstract

A Benchmark Suite for Evaluating Neural Mutual Information Estimators on Unstructured Datasets

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (15)