Simple Analysis of Priority Sampling

Majid Daliri; Juliana Freire; Christopher Musco; Aécio Santos; Haoxiang Zhang

Simple Analysis of Priority Sampling

Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang

TL;DR

This work provides a concise alternative proof that Priority Sampling (Sequential Poisson Sampling) achieves a total variance bound of $\frac{W^2}{k-1}$ for the Horvitz–Thompson-style estimator, aligning with the variance of Threshold Sampling up to a constant factor. The core idea is to introduce per-item thresholds $\tau_i$ (the $k$-th smallest $u_j/w_j$ among $j \neq i$) and to bound $\mathbb{E}[1/\tau_i]$, avoiding the heavy integral machinery of prior proofs. The authors establish $\mathbb{E}[\hat{w}_i] = w_i$ and $\mathbb{E}[\hat{w}_i \hat{w}_j] = w_i w_j$, and show $\operatorname{Var}[\hat{W}] = \sum_i \operatorname{Var}[\hat{w}_i] \le {W^2}/{(k-1)}$, with $\mathbb{E}[\hat{W}] = W$. The paper also discusses an accompanying tighter bound and pedagogical insights, including the case of uniform weights and connections to related sampling techniques like KMV, enhancing understanding of when Priority Sampling is near-optimal.

Abstract

We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup.

Simple Analysis of Priority Sampling

TL;DR

This work provides a concise alternative proof that Priority Sampling (Sequential Poisson Sampling) achieves a total variance bound of

for the Horvitz–Thompson-style estimator, aligning with the variance of Threshold Sampling up to a constant factor. The core idea is to introduce per-item thresholds

(the

-th smallest

among

) and to bound

, avoiding the heavy integral machinery of prior proofs. The authors establish

and

, and show

, with

. The paper also discusses an accompanying tighter bound and pedagogical insights, including the case of uniform weights and connections to related sampling techniques like KMV, enhancing understanding of when Priority Sampling is near-optimal.

Abstract

Paper Structure (7 sections, 2 theorems, 27 equations)

This paper contains 7 sections, 2 theorems, 27 equations.

Background
Threshold Sampling
Priority Sampling
Main Analysis
Discussion and Pedagogical Perspective
Proof of Fact expcov
Comparison to Szegedy's Result and a Refinement

Key Result

Theorem 1

Let $\hat{w}_1, \ldots, \hat{w}_n$ be as defined in eq:priority_sampling, let $\hat{W} = \sum_{i=1}^n \hat{w}_i$, and let ${W} = \sum_{i=1}^n {w}_i$.

Theorems & Definitions (19)

Theorem 1: Szegedy:2006, Thm. 4
Claim 2
proof
Claim 3
proof
Claim 4
proof
proof : Proof of \ref{['thm:mainvar']}
Corollary 5
proof
...and 9 more

Simple Analysis of Priority Sampling

TL;DR

Abstract

Simple Analysis of Priority Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)