Table of Contents
Fetching ...

Asymptotic behavior of clusters in hierarchical species sampling models

Stefano Favaro, Shui Feng, J. E. Paguyo

TL;DR

This work analyzes the asymptotic behavior of clusters in hierarchical species sampling models by leveraging a random sample size representation that connects hierarchical models to non-hierarchical SSRMs. It establishes almost sure and $L^p$ convergence for cluster frequencies, Gaussian fluctuation theorems for the total number of clusters under random sampling, and large deviation principles for both the total number of clusters and frequency counts. The results are developed across four hierarchical regimes (HDP, HPYP variants) and are unified through a random index approach that reduces hierarchical problems to single-level analyses. These findings deepen understanding of cluster dynamics in Bayesian nonparametric models and provide tools for inference on unseen species and model parameters in hierarchical settings.

Abstract

Consider a sample of size $N$ from a population governed by a hierarchical species sampling model. We study the large $N$ asymptotic behavior of the number ${\bf K}_N$ of clusters and the number ${\bf M}_{r,N}$ of clusters with frequency $r$ in the sample. In particular, we show almost sure and $L^p$ convergence for ${\bf M}_{r,N}$, obtain Gaussian fluctuation theorems for ${\bf K}_N$, and establish large deviation principles for both ${\bf K}_N$ and ${\bf M}_{r,N}$. Our approach relies on a random sample size representation of the number of clusters through the corresponding non-hierarchical species sampling model.

Asymptotic behavior of clusters in hierarchical species sampling models

TL;DR

This work analyzes the asymptotic behavior of clusters in hierarchical species sampling models by leveraging a random sample size representation that connects hierarchical models to non-hierarchical SSRMs. It establishes almost sure and convergence for cluster frequencies, Gaussian fluctuation theorems for the total number of clusters under random sampling, and large deviation principles for both the total number of clusters and frequency counts. The results are developed across four hierarchical regimes (HDP, HPYP variants) and are unified through a random index approach that reduces hierarchical problems to single-level analyses. These findings deepen understanding of cluster dynamics in Bayesian nonparametric models and provide tools for inference on unseen species and model parameters in hierarchical settings.

Abstract

Consider a sample of size from a population governed by a hierarchical species sampling model. We study the large asymptotic behavior of the number of clusters and the number of clusters with frequency in the sample. In particular, we show almost sure and convergence for , obtain Gaussian fluctuation theorems for , and establish large deviation principles for both and . Our approach relies on a random sample size representation of the number of clusters through the corresponding non-hierarchical species sampling model.
Paper Structure (22 sections, 13 theorems, 93 equations)

This paper contains 22 sections, 13 theorems, 93 equations.

Key Result

Proposition 2.1

Suppose $\frac{N_i}{N} = \frac{N_i(N)}{N} \to w_i > 0$ as $N \to \infty$ for all $i \in [d]$. Then as $N \to \infty$, the following almost sure convergences hold: where $\eta = \sum_{i=1}^d w_i^\beta S_{\beta,\theta,i}$ with $\{S_{\beta, \theta, i}\} \stackrel{iid}{\sim} S_{\beta,\theta}$ independently of $S_{\alpha, \theta_0}$. Moreover the above convergences also hold in $L^p$ for all integers

Theorems & Definitions (23)

  • Proposition 2.1: Proposition 9, BCR20
  • Proposition 2.2: Theorem 2.3, BF24
  • Proposition 2.3: Theorem 1.2, FH98
  • Proposition 2.4: Theorem 1.1, FF15
  • Lemma 3.1
  • proof
  • Proposition 3.2
  • proof
  • Remark 3.3
  • Remark 3.4
  • ...and 13 more