The Harmonic Entropy Estimator: Minimax Optimality and Semiparametric Efficiency for Infinite Alphabets
Octavio César Mesner
TL;DR
This work tackles Shannon entropy estimation for discrete distributions with countably infinite support by introducing the harmonic entropy estimator, built on exact identities that connect harmonic-transformed binomial counts to log-probabilities. It proves a sharp $L_2$ minimax rate of $1/n$ for tail decays $p_j \lesssim j^{-2}$, extending finite-support results to infinite alphabets, and shows semiparametric efficiency under the stronger tail condition $p_j = o(j^{-2})$, with $\sqrt{n}(\hat H - H) \Rightarrow N(0, \mathrm{Var}[\log p(X)])$. This combination yields a simple, one-step estimator with precise bias and variance characterizations and establishes the sharp statistical limits for entropy estimation over broad tail classes. The results unify finite-variance and certain monotone tail distributions, and offer a solid foundation for practical inference and potential extensions to continuous or mixed settings.
Abstract
This paper considers the estimation of Shannon entropy for discrete distributions with countably infinite support. While minimax rates for finite-support distributions are established, infinite-support distributions present distinct challenges regarding bias control as probabilities vanish. We address this by introducing the \textit{harmonic entropy estimator}, a statistic derived from an exact algebraic identity relating the expectation of harmonic-transformed binomial counts to the logarithm of underlying success probabilities. We establish two main results characterizing the statistical limits of this problem. First, for the class of distributions with at least quadratically decaying tails ($p_j\lesssim j^{-2}$), we prove that the estimator achieves the parametric $L_2$-minimax convergence rate of order $1/n$. Second, under the stronger condition $p_j =o(j^{-2})$, we demonstrate that the estimator is semiparametrically efficient, converging to a normal distribution with variance matching the asymptotic efficiency bound $\textrm{Var}[\log p(X)]$. These results unify entropy estimation theory for finite-variance distributions, and provide a simple, one-step estimator with sharp theoretical guarantees.
