A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

Mehmet Siddik Cadirci; Martin Singull

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

Mehmet Siddik Cadirci, Martin Singull

TL;DR

Estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions are examined and results indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

Abstract

We examine the estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate Gaussian distributions uniquely maximize entropy. As a result, the KL divergence from a moment-matched Gaussian distribution to an unknown density can then be written as the \emph{entropy difference}, which is a suitable information-theoretic measure of divergence from the Gaussian distribution. To estimate, we use $k$-nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko-Leonenko approach and subsequent improvements, along with the consistency and $L^{2}$-convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals under generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between (i) the entropy of a Gaussian model fitted to the sample mean and covariance and (ii) the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero under multivariate normality and converges to a strictly positive bound under non-Gaussian alternatives. Results from Monte Carlo simulations on various dimensions and sample sizes indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

TL;DR

Abstract

-nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko-Leonenko approach and subsequent improvements, along with the consistency and

-convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals under generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between (i) the entropy of a Gaussian model fitted to the sample mean and covariance and (ii) the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero under multivariate normality and converges to a strictly positive bound under non-Gaussian alternatives. Results from Monte Carlo simulations on various dimensions and sample sizes indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

Paper Structure (16 sections, 3 theorems, 31 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 3 theorems, 31 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Maximum Entropy Principles
Nearest-Neighbor Estimators of the Shannon Entropy and KL divergrence
Nearest-neighbor estimation of Shannon entropy
Nearest-neighbor estimation of KL divergence
Asymptotic properties
Goodness-of-fit tests based on KL divergence
Numerical Experiments
Monte Carlo study of $T_{N,k}^{\mathrm{KL}}$: convergence and stability
Power of the goodness-of-fit test
Power against generalised Gaussian alternatives
Power against Student-type alternatives
Empirical distribution of the standardised statistic
Finite-sample rate of convergence: a log-log regression diagnostic
Critical values for practical implementation
...and 1 more sections

Key Result

Proposition 2.1

Assume that $\mathcal{K}$ is defined as in Definition def:classK_KL and $\phi_{{\mu},{\Sigma}}$ represents a Gaussian density eq_gaussian_density, with mean ${\mu}$ and covariance ${\Sigma}$. Then , for every $f \in \mathcal{K}$, with equality if and only if $f = \phi_{{\mu},{\Sigma}}$ holds almost everywhere (a.e.).

Figures (6)

Figure 1: Convergence behavior of $T_{N,k}^{\mathrm{KL}}$ for the neighborhood dimension $k=1$. Error bars represent one standard deviation empirically over $M=100$ iterations. This corresponds to the Gaussian benchmark, $s=2$; here, the statistic becomes concentrated near zero as $N$ increases. The convergence behavior of $T_{N,k}^{\mathrm{KL}}$ for $k=1$.
Figure 2: Finite sample stability of $T_{N,k}^{\mathrm{KL}}$ over neighborhood dimensions $k\in\{1,2,3\}$. Error bars indicate one empirical standard deviation. Variability decreases with $N$ and further decreases with larger neighborhoods.
Figure 3: Curves compare $k\in\{1,2,3\}$ across dimensions; rows correspond to $N=500$ and $N=1000$, with $M=1000$ replications. Curves compare $k\in\{1,2,3\}$ across dimensions; rows correspond to $N=500$ and $N=1000$, with $M=1000$ replications.
Figure 4: The empirical power of $T_{N,k}^{\mathrm{KL}}$ compared to studentized alternatives. The solid and dashed lines represent $N=1000$ and $N=500$, respectively; the lines compare $k\in\{1,2,3\}$.
Figure 5: For $N=1000$ and $k\in\{1,2,3\}$, kernel density estimates of the standardized statistic $Z_{N,k}$ were evaluated and compared with the standard Gaussian density $\mathcal{N}(0,1)$.
...and 1 more figures

Theorems & Definitions (9)

Definition 2.1: Class $\mathcal{K}$
Proposition 2.1: KL divergence Gauss benchmark
proof
Remark 2.2
Remark 3.1
Theorem 3.1: Consistency of kNN estimators
Remark 3.2
Theorem 4.1
Remark 4.1: KL divergence as expected log-loss

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

TL;DR

Abstract

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)