Logarithmic Asymptotic Relations Between $p$-Values and Mutual Information
Tsutomu Mori, Takashi Kawamura
TL;DR
The paper tackles the challenge of linking p-values for independence tests with information-theoretic dependence quantified by mutual information ($MI$). It develops a maximum-entropy calibration where $P_{MI}=e^{-MI}$, and proves a precise asymptotic relation for Fisher's exact test in contingency tables with fixed margins: $MI=-(1/N)\log P_F + O(\log(N+1)/N)$. The main contribution is Theorem 3, establishing this logarithmic link for general $m\times n$ tables, with detailed finite-$N$ error bounds in the $2\times2$ case and extensions to larger tables. The work also discusses numerical validation, meta-analysis strategies to combine MI evidence across studies, and practical implications for comparing dependence across datasets with different sample sizes, providing a principled information-theoretic interpretation of statistical significance.
Abstract
We establish a precise connection between statistical significance in dependence testing and information-theoretic dependence as quantified by Shannon mutual information (MI). In the absence of prior distributional information, we consider a maximum-entropy model and show that the probability associated with the realization of a given magnitude of MI takes an exponential form, yielding a corresponding tail-probability interpretation of a $p$-value. In contingency tables with fixed marginal frequencies, we analyze Fisher's exact test and prove that its $p$-value $P_F$ satisfies a logarithmic asymptotic relation of the form $MI=-(1/N)\log P_F + O(\log(N+1)/N)$ as the sample size $N\to\infty$. These results clarify the role of MI as the exponential rate governing the asymptotic behavior of $p$-values in the settings studied here, and they enable principled comparisons of dependence across datasets with different sample sizes. We further discuss implications for combining evidence across studies via meta-analysis, allowing mutual information and its statistical significance to be integrated in a unified framework.
