Table of Contents
Fetching ...

Simple Analysis of Priority Sampling

Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang

TL;DR

This work provides a concise alternative proof that Priority Sampling (Sequential Poisson Sampling) achieves a total variance bound of $\frac{W^2}{k-1}$ for the Horvitz–Thompson-style estimator, aligning with the variance of Threshold Sampling up to a constant factor. The core idea is to introduce per-item thresholds $\tau_i$ (the $k$-th smallest $u_j/w_j$ among $j \neq i$) and to bound $\mathbb{E}[1/\tau_i]$, avoiding the heavy integral machinery of prior proofs. The authors establish $\mathbb{E}[\hat{w}_i] = w_i$ and $\mathbb{E}[\hat{w}_i \hat{w}_j] = w_i w_j$, and show $\operatorname{Var}[\hat{W}] = \sum_i \operatorname{Var}[\hat{w}_i] \le {W^2}/{(k-1)}$, with $\mathbb{E}[\hat{W}] = W$. The paper also discusses an accompanying tighter bound and pedagogical insights, including the case of uniform weights and connections to related sampling techniques like KMV, enhancing understanding of when Priority Sampling is near-optimal.

Abstract

We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup.

Simple Analysis of Priority Sampling

TL;DR

This work provides a concise alternative proof that Priority Sampling (Sequential Poisson Sampling) achieves a total variance bound of for the Horvitz–Thompson-style estimator, aligning with the variance of Threshold Sampling up to a constant factor. The core idea is to introduce per-item thresholds (the -th smallest among ) and to bound , avoiding the heavy integral machinery of prior proofs. The authors establish and , and show , with . The paper also discusses an accompanying tighter bound and pedagogical insights, including the case of uniform weights and connections to related sampling techniques like KMV, enhancing understanding of when Priority Sampling is near-optimal.

Abstract

We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup.
Paper Structure (7 sections, 2 theorems, 27 equations)

This paper contains 7 sections, 2 theorems, 27 equations.

Key Result

Theorem 1

Let $\hat{w}_1, \ldots, \hat{w}_n$ be as defined in eq:priority_sampling, let $\hat{W} = \sum_{i=1}^n \hat{w}_i$, and let ${W} = \sum_{i=1}^n {w}_i$.

Theorems & Definitions (19)

  • Theorem 1: Szegedy:2006, Thm. 4
  • Claim 2
  • proof
  • Claim 3
  • proof
  • Claim 4
  • proof
  • proof : Proof of \ref{['thm:mainvar']}
  • Corollary 5
  • proof
  • ...and 9 more