Simple Analysis of Priority Sampling
Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang
TL;DR
This work provides a concise alternative proof that Priority Sampling (Sequential Poisson Sampling) achieves a total variance bound of $\frac{W^2}{k-1}$ for the Horvitz–Thompson-style estimator, aligning with the variance of Threshold Sampling up to a constant factor. The core idea is to introduce per-item thresholds $\tau_i$ (the $k$-th smallest $u_j/w_j$ among $j \neq i$) and to bound $\mathbb{E}[1/\tau_i]$, avoiding the heavy integral machinery of prior proofs. The authors establish $\mathbb{E}[\hat{w}_i] = w_i$ and $\mathbb{E}[\hat{w}_i \hat{w}_j] = w_i w_j$, and show $\operatorname{Var}[\hat{W}] = \sum_i \operatorname{Var}[\hat{w}_i] \le {W^2}/{(k-1)}$, with $\mathbb{E}[\hat{W}] = W$. The paper also discusses an accompanying tighter bound and pedagogical insights, including the case of uniform weights and connections to related sampling techniques like KMV, enhancing understanding of when Priority Sampling is near-optimal.
Abstract
We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup.
