Table of Contents
Fetching ...

Universal Perfect Samplers for Incremental Streams

Seth Pettie, Dingyu Wang

TL;DR

This work addresses exact $G$-sampling on incremental streams for a broad class of $G$ corresponding to Laplace exponents of non-negative Lévy processes. It introduces Lévy-Khintchine–based level functions $\ell_G$ to realize perfect sampling, delivering a two-word memory $G$-sampler and a universal $\mathcal{G}$-sampler that uses $O(\log n)$ memory and can produce exact samples for any $G\in\mathcal{G}$ at query time. The approach extends to sampling sequences (with/without replacement) and to sampling edges in graphs via stochastic sampling circuits, with concrete level-function derivations for cases like $G(z)=z^{1/2}$, $G(z)=1-e^{-\tau z}$, and $G(z)=\log(1+z)$. By tying $G$ to Lévy processes, the paper achieves near-optimal space bounds and broad applicability, and it opens avenues for representing other stochastic objects within this framework. Overall, it provides a principled, memory-efficient toolkit for exact streaming sampling grounded in Lévy-Khintchine theory and stochastic circuits.

Abstract

If $G : \mathbb{R}_+ \to \mathbb{R}_+$, the $G$-moment of a vector $\mathbf{x}\in\mathbb{R}_+^n$ is $G(\mathbf{x}) = \sum_{v\in[n]} G(\mathbf{x}(v))$ and the $G$-sampling problem is to select an index $v_*\in [n]$ according to its contribution to the $G$-moment, i.e., such that $\Pr(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$. Approximate $G$-samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure. In this paper we focus on the exact $G$-sampling problem, where $G$ is selected from the class $\mathcal{G}$ of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as $p$th moments $G(z)=z^p$, $p\in[0,1]$, logarithms $G(z)=\log(1+z)$, Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop $G$-samplers for a vector $\mathbf{x} \in \mathbb{R}_+^n$ that is presented as an incremental stream of positive updates. In particular: * For any $G\in\mathcal{G}$, we give a very simple $G$-sampler that uses 2 words of memory and stores at all times a $v_*\in [n]$, such that $\Pr(v_*=v)$ is exactly $G(\mathbf{x}(v))/G(\mathbf{x})$. * We give a ``universal'' $\mathcal{G}$-sampler that uses $O(\log n)$ words of memory w.h.p., and given any $G\in \mathcal{G}$ at query time, produces an exact $G$-sample. With an overhead of a factor of $k$, both samplers can be used to $G$-sample a sequence of $k$ indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.

Universal Perfect Samplers for Incremental Streams

TL;DR

This work addresses exact -sampling on incremental streams for a broad class of corresponding to Laplace exponents of non-negative Lévy processes. It introduces Lévy-Khintchine–based level functions to realize perfect sampling, delivering a two-word memory -sampler and a universal -sampler that uses memory and can produce exact samples for any at query time. The approach extends to sampling sequences (with/without replacement) and to sampling edges in graphs via stochastic sampling circuits, with concrete level-function derivations for cases like , , and . By tying to Lévy processes, the paper achieves near-optimal space bounds and broad applicability, and it opens avenues for representing other stochastic objects within this framework. Overall, it provides a principled, memory-efficient toolkit for exact streaming sampling grounded in Lévy-Khintchine theory and stochastic circuits.

Abstract

If , the -moment of a vector is and the -sampling problem is to select an index according to its contribution to the -moment, i.e., such that . Approximate -samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure. In this paper we focus on the exact -sampling problem, where is selected from the class of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as th moments , , logarithms , Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop -samplers for a vector that is presented as an incremental stream of positive updates. In particular: * For any , we give a very simple -sampler that uses 2 words of memory and stores at all times a , such that is exactly . * We give a ``universal'' -sampler that uses words of memory w.h.p., and given any at query time, produces an exact -sample. With an overhead of a factor of , both samplers can be used to -sample a sequence of indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.
Paper Structure (12 sections, 5 theorems, 19 equations, 2 figures, 5 algorithms)

This paper contains 12 sections, 5 theorems, 19 equations, 2 figures, 5 algorithms.

Key Result

Theorem 1

Fix any $G\in\mathcal{G}$. The generic $G$-Sampler stores a pair $(v_*,h_*)\in[n]\times \mathbb{R}_+$ such that at all times, $\mathbb{P}(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$, i.e., it is a truly perfect $G$-sampler with zero probability of failure.

Figures (2)

  • Figure 1: The flat stochastic sampling circuit corresponding to the generic $G$-Sampler (\ref{['alg:G-sampler']}).
  • Figure 2: Circuit diagram for $G$-Edge-Sampler (\ref{['alg:G-edge-sampler']}) with $G(a,b) = \log(1+\sqrt{a}+\sqrt{b}) + 2(1-e^{-(a+b)})$. Depicted are two of the input gates ($u$ and $v$), the unique output gate, and all gates related to the sampling of edge $\{u,v\}$. The dice symbol indicates the random seed $U \sim \mathrm{Uniform}(0,1)$ used by each $G$-gate, as well as the "fresh" exponential random variables $Y\sim \mathrm{Exp}(1)$ generated by the input gates for each update and each output wire. The "$2x$"-gate is a deterministic scalar gate with $\alpha=2$.

Theorems & Definitions (20)

  • Definition 1: Approximate/Perfect/Truly Perfect $G$-samplers JayaramW21JayaramWZ22Jayaram21-PhDthesis
  • Theorem 1: $G$-Sampler
  • Theorem 2
  • Lemma 1: level functions
  • Definition 2: non-negative Lévy processes ken1999levy
  • Theorem 3: Lévy-Khintchine representation for non-negative Lévy processes. See Sato ken1999levy
  • Definition 3: Lévy induced level function
  • proof : Proof of Lemma \ref{['lem:level']}
  • proof : Proof of \ref{['thm:generic-G-sampler']} ($G$-Sampler)
  • Remark 1
  • ...and 10 more