Table of Contents
Fetching ...

Estimation and Confidence Intervals for Mutual Information: Issues in Convergence for Non-Normal Distributions

Theo Grigorenko, Leo Grigorenko

Abstract

By employing various empirical estimators for the Mutual Information (MI) measure, we calculate and compare the estimates and their confidence intervals for both normal and non-normal bivariate data samples. We find that certain nonlinear invertible transformations of the random variables can significantly affect both the estimated MI value and the precision and asymptotic behavior of its confidence intervals. Generally, for non-normal samples, the confidence intervals are larger than those for normal samples, and the convergence of the confidence intervals is slower even as the data sample size increases. In some cases, due to strong biases, the estimated confidence interval may not contain the true value at all. We discuss various strategies to improve the precision of the estimated Mutual Information.

Estimation and Confidence Intervals for Mutual Information: Issues in Convergence for Non-Normal Distributions

Abstract

By employing various empirical estimators for the Mutual Information (MI) measure, we calculate and compare the estimates and their confidence intervals for both normal and non-normal bivariate data samples. We find that certain nonlinear invertible transformations of the random variables can significantly affect both the estimated MI value and the precision and asymptotic behavior of its confidence intervals. Generally, for non-normal samples, the confidence intervals are larger than those for normal samples, and the convergence of the confidence intervals is slower even as the data sample size increases. In some cases, due to strong biases, the estimated confidence interval may not contain the true value at all. We discuss various strategies to improve the precision of the estimated Mutual Information.

Paper Structure

This paper contains 6 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: $5\%$ and $95\%$ quantile confidence intervals ($"\bigtriangleup$"), the estimated mean MI value ("o"), and the analytical solution ("'*") as functions of the data length $N$. This is the case of normal bivariate data, $\sigma_1=\sigma_2=1$, $\rho=0.5$. The numerical method used is the Kraskov-Stoegbauer-Grassberger estimator (see text).
  • Figure 2: $5\%$ and $95\%$ quantile confidence intervals ($" \bigtriangleup$"), the estimated mean MI value ( "o"), and the analytical solution ("*") as functions of the data length $N$. This is the case of lognormal bivariate data, $\sigma_1=\sigma_2= 1$, $\rho=0.5$. The numerical method used is the KSG estimator.
  • Figure 3: $5\%$ and $95\%$ quantile confidence intervals ($" \bigtriangleup$"), the estimated mean MI value ("o"), and the analytical solution ("*") as functions of the data length $N$. This is the case of student-t distributed bivariate data, with the degrees of freedom $\nu=3$, $\sigma_1=\sigma_2=1$, $\rho=0.5$. The numerical method used is the KSG estimator.
  • Figure 4: $5\%$ and $95 \%$ quantile confidence intervals ($" \bigtriangleup$"), the estimated mean MI value ( "o"), and the analytical solution ("*") as functions of the data length $N$. This is the case of cubic transformed $(X', Y') = (X^3, Y^3)$ normal bivariate data, $\sigma_1=\sigma_2=1$, $\rho=0.5$. The numerical method used is the KSG estimator.
  • Figure 5: $5\%$ and $95\%$ quantile confidence intervals ($" \bigtriangleup$"), the estimated mean MI value ("o"), and the analytical solution ("*") as functions of the data length $N$. This is the case of cubic transformed $(X', Y') = (X^3, Y^3)$ lognormal bivariate data, $\sigma_1=\sigma_2=1$, $\rho=0.5$. The numerical method used is the KSG estimator.
  • ...and 3 more figures