Table of Contents
Fetching ...

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

Mehmet Siddik Cadirci, Martin Singull

TL;DR

Estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions are examined and results indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

Abstract

We examine the estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate Gaussian distributions uniquely maximize entropy. As a result, the KL divergence from a moment-matched Gaussian distribution to an unknown density can then be written as the \emph{entropy difference}, which is a suitable information-theoretic measure of divergence from the Gaussian distribution. To estimate, we use $k$-nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko-Leonenko approach and subsequent improvements, along with the consistency and $L^{2}$-convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals under generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between (i) the entropy of a Gaussian model fitted to the sample mean and covariance and (ii) the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero under multivariate normality and converges to a strictly positive bound under non-Gaussian alternatives. Results from Monte Carlo simulations on various dimensions and sample sizes indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

TL;DR

Estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions are examined and results indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

Abstract

We examine the estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate Gaussian distributions uniquely maximize entropy. As a result, the KL divergence from a moment-matched Gaussian distribution to an unknown density can then be written as the \emph{entropy difference}, which is a suitable information-theoretic measure of divergence from the Gaussian distribution. To estimate, we use -nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko-Leonenko approach and subsequent improvements, along with the consistency and -convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals under generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between (i) the entropy of a Gaussian model fitted to the sample mean and covariance and (ii) the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero under multivariate normality and converges to a strictly positive bound under non-Gaussian alternatives. Results from Monte Carlo simulations on various dimensions and sample sizes indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.
Paper Structure (16 sections, 3 theorems, 31 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 3 theorems, 31 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Proposition 2.1

Assume that $\mathcal{K}$ is defined as in Definition def:classK_KL and $\phi_{{\mu},{\Sigma}}$ represents a Gaussian density eq_gaussian_density, with mean ${\mu}$ and covariance ${\Sigma}$. Then , for every $f \in \mathcal{K}$, with equality if and only if $f = \phi_{{\mu},{\Sigma}}$ holds almost everywhere (a.e.).

Figures (6)

  • Figure 1: Convergence behavior of $T_{N,k}^{\mathrm{KL}}$ for the neighborhood dimension $k=1$. Error bars represent one standard deviation empirically over $M=100$ iterations. This corresponds to the Gaussian benchmark, $s=2$; here, the statistic becomes concentrated near zero as $N$ increases. The convergence behavior of $T_{N,k}^{\mathrm{KL}}$ for $k=1$.
  • Figure 2: Finite sample stability of $T_{N,k}^{\mathrm{KL}}$ over neighborhood dimensions $k\in\{1,2,3\}$. Error bars indicate one empirical standard deviation. Variability decreases with $N$ and further decreases with larger neighborhoods.
  • Figure 3: Curves compare $k\in\{1,2,3\}$ across dimensions; rows correspond to $N=500$ and $N=1000$, with $M=1000$ replications. Curves compare $k\in\{1,2,3\}$ across dimensions; rows correspond to $N=500$ and $N=1000$, with $M=1000$ replications.
  • Figure 4: The empirical power of $T_{N,k}^{\mathrm{KL}}$ compared to studentized alternatives. The solid and dashed lines represent $N=1000$ and $N=500$, respectively; the lines compare $k\in\{1,2,3\}$.
  • Figure 5: For $N=1000$ and $k\in\{1,2,3\}$, kernel density estimates of the standardized statistic $Z_{N,k}$ were evaluated and compared with the standard Gaussian density $\mathcal{N}(0,1)$.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 2.1: Class $\mathcal{K}$
  • Proposition 2.1: KL divergence Gauss benchmark
  • proof
  • Remark 2.2
  • Remark 3.1
  • Theorem 3.1: Consistency of kNN estimators
  • Remark 3.2
  • Theorem 4.1
  • Remark 4.1: KL divergence as expected log-loss