Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information

Tsutomu Mori; Takashi Kawamura

Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information

Tsutomu Mori, Takashi Kawamura

TL;DR

The paper tackles the challenge of linking p-values for independence tests with information-theoretic dependence quantified by mutual information ($MI$). It develops a maximum-entropy calibration where $P_{MI}=e^{-MI}$, and proves a precise asymptotic relation for Fisher's exact test in contingency tables with fixed margins: $MI=-(1/N)\log P_F + O(\log(N+1)/N)$. The main contribution is Theorem 3, establishing this logarithmic link for general $m\times n$ tables, with detailed finite-$N$ error bounds in the $2\times2$ case and extensions to larger tables. The work also discusses numerical validation, meta-analysis strategies to combine MI evidence across studies, and practical implications for comparing dependence across datasets with different sample sizes, providing a principled information-theoretic interpretation of statistical significance.

Abstract

We establish a precise connection between statistical significance in dependence testing and information-theoretic dependence as quantified by Shannon mutual information (MI). In the absence of prior distributional information, we consider a maximum-entropy model and show that the probability associated with the realization of a given magnitude of MI takes an exponential form, yielding a corresponding tail-probability interpretation of a $p$-value. In contingency tables with fixed marginal frequencies, we analyze Fisher's exact test and prove that its $p$-value $P_F$ satisfies a logarithmic asymptotic relation of the form $MI=-(1/N)\log P_F + O(\log(N+1)/N)$ as the sample size $N\to\infty$. These results clarify the role of MI as the exponential rate governing the asymptotic behavior of $p$-values in the settings studied here, and they enable principled comparisons of dependence across datasets with different sample sizes. We further discuss implications for combining evidence across studies via meta-analysis, allowing mutual information and its statistical significance to be integrated in a unified framework.

Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information

TL;DR

The paper tackles the challenge of linking p-values for independence tests with information-theoretic dependence quantified by mutual information (

). It develops a maximum-entropy calibration where

, and proves a precise asymptotic relation for Fisher's exact test in contingency tables with fixed margins:

. The main contribution is Theorem 3, establishing this logarithmic link for general

tables, with detailed finite-

error bounds in the

case and extensions to larger tables. The work also discusses numerical validation, meta-analysis strategies to combine MI evidence across studies, and practical implications for comparing dependence across datasets with different sample sizes, providing a principled information-theoretic interpretation of statistical significance.

Abstract

-value. In contingency tables with fixed marginal frequencies, we analyze Fisher's exact test and prove that its

-value

satisfies a logarithmic asymptotic relation of the form

as the sample size

. These results clarify the role of MI as the exponential rate governing the asymptotic behavior of

-values in the settings studied here, and they enable principled comparisons of dependence across datasets with different sample sizes. We further discuss implications for combining evidence across studies via meta-analysis, allowing mutual information and its statistical significance to be integrated in a unified framework.

Paper Structure (28 sections, 3 theorems, 57 equations, 3 figures)

This paper contains 28 sections, 3 theorems, 57 equations, 3 figures.

Introduction
Posing a problem
Maximum-entropy calibration in probability theory
Principle of maximum entropy
Principle of equal probability in statistics
An exponential-form relation between information and probability
The $p$-value based on information theory
Logarithmic asymptotics in statistics: Fisher's exact test
Occurring probability of information exchange per one observation
Occurring probability of information exchange during many observations
The $p$-value based on information theory
Analysis using conventional statistics
Proofs of Theorems \ref{['th1']} and \ref{['th2']}
Proof of Theorem \ref{['th1']}
Proof of Theorem \ref{['th2']}
...and 13 more sections

Key Result

Theorem 1

Let $X$ and $Y$ be random variables that follow a uniform distribution. Then the probability ${P_{MI}}$ that the magnitude of MI shared by them becomes $MI$ is represented as

Figures (3)

Figure 1: $m\times n$ contingency table when random variables $X$ and $Y$ take $X_1$-$X_m$ and $Y_1$-$Y_n$, respectively. $x_{ij}$ is the joint frequency, $a_1$-$a_m$ and $b_1$-$b_n$ are marginal frequencies, and $N$ is the sample size.
Figure 2: $P_F$, $P_{\chi^2}$ and $MI$ of $2\times 2$ contingency tables. (a) $P_F$ and $MI$. (b) $P_{\chi^2}$ and $MI$. The equations and $R^2$ represent the regression lines and the determination coefficients between $-(\log{P_F})/N$ and $MI$, and between $-(\log{P_{\chi^2}})/N$ and $MI$, respectively.
Figure 3: $P_F$, $P_{\chi^2}$ and $MI$ of $3\times 3$ contingency tables. (a) $P_F$ and $MI$. (b) $P_{\chi^2}$ and $MI$. The equations and $R^2$ represent the regression lines and the determination coefficients between $-(\log{P_F})/N$ and $MI$, and between $-(\log{P_{\chi^2}})/N$ and $MI$, respectively.

Theorems & Definitions (3)

Theorem 1
Theorem 2
Theorem 3

Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information

TL;DR

Abstract

Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)