Limit theorems for a class of random outer measures in infinite urn schemes
Berhane Abebe, Mikhail Chebunin, Artyom Kovalevskii
TL;DR
This paper develops a rigorous limit-theory framework for random outer measures arising from Karlin-type infinite urn schemes restricted to subsets of the unit interval. By Poissonizing the process, the authors obtain independent per-urn counts and prove a central limit theorem for Poissonized statistics under a regular variation parameter $\theta\in(0,1)$, along with a functional central limit theorem for statistics indexed by finite unions of intervals, yielding a Gaussian random field with explicit covariances. A strong law of large numbers is established for these restricted occupancy counts, and the results are extended to non-Poissonized statistics and weighted sums. The theoretical developments are connected to probabilistic text models, including a circular text statistic that enables goodness-of-fit tests against infinite-dictionary models for texts.
Abstract
An urn scheme is a probabilistic model in which balls are placed into urns sequentially and independently of each other. All balls share the same probability distribution for hitting the urns. In the simplest case, there is a finite number of urns and the probabilities of hitting each urn are equal. In an infinite urn scheme, there is a countable number of urns, and the hitting probabilities form a probability mass function on the set of urn labels, so they depend on the urn number. The statistics of interest is the number of urns with at least k balls after throwing n balls. Thus, we assume that there is a countable family of urns, and we fix the probabilities for a ball to hit each urn (the same for all balls). For an arbitrary subset A of the unit interval [0, 1], we do not consider all ball indices from 1 to n, but only those that belong to the set nA, and we study the number of urns with at least k balls after throwing the balls with indices in nA. This number is non-negative, and if the set A is empty, this number is equal to zero. Moreover, if k=1, then it satisfies the property of countable subadditivity. Hence, the number of non-empty urns for ball indices in nA, where A is an arbitrary subset of the unit interval, satisfies all the axioms of an outer measure on the unit interval. We study the properties of the statistics of interest. Our main result is a functional central limit theorem for sets A consisting of finite unions of intervals and parameterized by their boundary points. We consider applications of this theorem to elementary probabilistic models of text.
