The Minimax Risk in Testing Uniformity over Large Alphabets under Missing-Ball Alternatives
Alon Kipnis
TL;DR
The paper analyzes the minimax risk of testing uniformity for Poisson-distributed counts across a large alphabet under ell_p (p ≤ 2) departures from uniformity. It develops a Bayesian reduction to a structured subset of alternatives and proves uniform asymptotic normality for linear histogram tests, identifying a unique least-f favorable prior π^* that yields a minimax test ψ^*. The main results provide a precise asymptotic risk formula involving u_{oldsymbol{ε},n,N,p} and show the minimax test outperforms chi-squared and collision-based tests except in certain regimes; they also relate the Poisson minimax risk to the multinomial setting via de-Poissonization. Empirical results corroborate theoretical predictions, and the framework opens avenues for extensions to non-ball shapes and sparse alternatives with practical impact on high-dimensional goodness-of-fit testing.
Abstract
We study the problem of testing the goodness of fit of categorical count data to a Poisson distribution uniform over the categories, against a class of alternatives defined by excluding an $\ell_p$ ball, $p \leq 2$, of radius $ε$ around the uniform rate sequence. We characterize the minimax risk for this problem as the expected number of samples $n$ and the number of categories $N$ go to infinity. Our result enables constant-factor comparisons among the many estimators previously proposed for this problem, rather than comparisons only at the level of convergence rates or scaling orders of sample complexity. The minimax test relies exclusively on collisions in the small sample limit, but behaves like the chi-squared test otherwise. Empirical studies across a range of parameters show that the asymptotic risk estimate is accurate in finite samples, and that the minimax test outperforms both the chi-squared test and a test based on collisions under the least favorable alternative. Our analysis involves a reduction to a structured subset of alternatives, establishing uniform asymptotic normality for a family of linear test statistics, and solving an optimization problem over $N$-dimensional sequences akin to classical results from signal detection in Gaussian white noise. Finally, we discuss the connection to the fixed-sample-size multinomial model, arguing that the Poisson minimax risk derived here also characterizes the minimax risk of the multinomial problem.
