Asymptotic behavior of clusters in hierarchical species sampling models
Stefano Favaro, Shui Feng, J. E. Paguyo
TL;DR
This work analyzes the asymptotic behavior of clusters in hierarchical species sampling models by leveraging a random sample size representation that connects hierarchical models to non-hierarchical SSRMs. It establishes almost sure and $L^p$ convergence for cluster frequencies, Gaussian fluctuation theorems for the total number of clusters under random sampling, and large deviation principles for both the total number of clusters and frequency counts. The results are developed across four hierarchical regimes (HDP, HPYP variants) and are unified through a random index approach that reduces hierarchical problems to single-level analyses. These findings deepen understanding of cluster dynamics in Bayesian nonparametric models and provide tools for inference on unseen species and model parameters in hierarchical settings.
Abstract
Consider a sample of size $N$ from a population governed by a hierarchical species sampling model. We study the large $N$ asymptotic behavior of the number ${\bf K}_N$ of clusters and the number ${\bf M}_{r,N}$ of clusters with frequency $r$ in the sample. In particular, we show almost sure and $L^p$ convergence for ${\bf M}_{r,N}$, obtain Gaussian fluctuation theorems for ${\bf K}_N$, and establish large deviation principles for both ${\bf K}_N$ and ${\bf M}_{r,N}$. Our approach relies on a random sample size representation of the number of clusters through the corresponding non-hierarchical species sampling model.
