Universal Perfect Samplers for Incremental Streams

Seth Pettie; Dingyu Wang

Universal Perfect Samplers for Incremental Streams

Seth Pettie, Dingyu Wang

TL;DR

This work addresses exact $G$-sampling on incremental streams for a broad class of $G$ corresponding to Laplace exponents of non-negative Lévy processes. It introduces Lévy-Khintchine–based level functions $\ell_G$ to realize perfect sampling, delivering a two-word memory $G$-sampler and a universal $\mathcal{G}$-sampler that uses $O(\log n)$ memory and can produce exact samples for any $G\in\mathcal{G}$ at query time. The approach extends to sampling sequences (with/without replacement) and to sampling edges in graphs via stochastic sampling circuits, with concrete level-function derivations for cases like $G(z)=z^{1/2}$, $G(z)=1-e^{-\tau z}$, and $G(z)=\log(1+z)$. By tying $G$ to Lévy processes, the paper achieves near-optimal space bounds and broad applicability, and it opens avenues for representing other stochastic objects within this framework. Overall, it provides a principled, memory-efficient toolkit for exact streaming sampling grounded in Lévy-Khintchine theory and stochastic circuits.

Abstract

If $G : \mathbb{R}_+ \to \mathbb{R}_+$, the $G$-moment of a vector $\mathbf{x}\in\mathbb{R}_+^n$ is $G(\mathbf{x}) = \sum_{v\in[n]} G(\mathbf{x}(v))$ and the $G$-sampling problem is to select an index $v_*\in [n]$ according to its contribution to the $G$-moment, i.e., such that $\Pr(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$. Approximate $G$-samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure. In this paper we focus on the exact $G$-sampling problem, where $G$ is selected from the class $\mathcal{G}$ of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as $p$th moments $G(z)=z^p$, $p\in[0,1]$, logarithms $G(z)=\log(1+z)$, Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop $G$-samplers for a vector $\mathbf{x} \in \mathbb{R}_+^n$ that is presented as an incremental stream of positive updates. In particular: * For any $G\in\mathcal{G}$, we give a very simple $G$-sampler that uses 2 words of memory and stores at all times a $v_*\in [n]$, such that $\Pr(v_*=v)$ is exactly $G(\mathbf{x}(v))/G(\mathbf{x})$. * We give a ``universal'' $\mathcal{G}$-sampler that uses $O(\log n)$ words of memory w.h.p., and given any $G\in \mathcal{G}$ at query time, produces an exact $G$-sample. With an overhead of a factor of $k$, both samplers can be used to $G$-sample a sequence of $k$ indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.

Universal Perfect Samplers for Incremental Streams

TL;DR

This work addresses exact

-sampling on incremental streams for a broad class of

corresponding to Laplace exponents of non-negative Lévy processes. It introduces Lévy-Khintchine–based level functions

to realize perfect sampling, delivering a two-word memory

-sampler and a universal

-sampler that uses

memory and can produce exact samples for any

at query time. The approach extends to sampling sequences (with/without replacement) and to sampling edges in graphs via stochastic sampling circuits, with concrete level-function derivations for cases like

, and

. By tying

to Lévy processes, the paper achieves near-optimal space bounds and broad applicability, and it opens avenues for representing other stochastic objects within this framework. Overall, it provides a principled, memory-efficient toolkit for exact streaming sampling grounded in Lévy-Khintchine theory and stochastic circuits.

Abstract

, the

-moment of a vector

and the

-sampling problem is to select an index

according to its contribution to the

-moment, i.e., such that

. Approximate

-samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure. In this paper we focus on the exact

-sampling problem, where

is selected from the class

of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as

th moments

, logarithms

, Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop

-samplers for a vector

that is presented as an incremental stream of positive updates. In particular: * For any

, we give a very simple

-sampler that uses 2 words of memory and stores at all times a

, such that

is exactly

. * We give a ``universal''

-sampler that uses

words of memory w.h.p., and given any

at query time, produces an exact

-sample. With an overhead of a factor of

, both samplers can be used to

-sample a sequence of

indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.

Paper Structure (12 sections, 5 theorems, 19 equations, 2 figures, 5 algorithms)

This paper contains 12 sections, 5 theorems, 19 equations, 2 figures, 5 algorithms.

Introduction
Prior Work
$L_p$-Sampling from Turnstile Streams.
$G$-Sampling from Incremental Streams.
New Results
Organization
Lévy Processes and Lévy-Khintchine Representation
Proofs of \ref{['lem:level']} and \ref{['thm:generic-G-sampler', 'thm:ParetoSampler']}
Deriving the Level Functions
Stochastic Sampling Circuits
Conclusion
Sampling Without Replacement

Key Result

Theorem 1

Fix any $G\in\mathcal{G}$. The generic $G$-Sampler stores a pair $(v_*,h_*)\in[n]\times \mathbb{R}_+$ such that at all times, $\mathbb{P}(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$, i.e., it is a truly perfect $G$-sampler with zero probability of failure.

Figures (2)

Figure 1: The flat stochastic sampling circuit corresponding to the generic $G$-Sampler (\ref{['alg:G-sampler']}).
Figure 2: Circuit diagram for $G$-Edge-Sampler (\ref{['alg:G-edge-sampler']}) with $G(a,b) = \log(1+\sqrt{a}+\sqrt{b}) + 2(1-e^{-(a+b)})$. Depicted are two of the input gates ($u$ and $v$), the unique output gate, and all gates related to the sampling of edge $\{u,v\}$. The dice symbol indicates the random seed $U \sim \mathrm{Uniform}(0,1)$ used by each $G$-gate, as well as the "fresh" exponential random variables $Y\sim \mathrm{Exp}(1)$ generated by the input gates for each update and each output wire. The "$2x$"-gate is a deterministic scalar gate with $\alpha=2$.

Theorems & Definitions (20)

Definition 1: Approximate/Perfect/Truly Perfect $G$-samplers JayaramW21JayaramWZ22Jayaram21-PhDthesis
Theorem 1: $G$-Sampler
Theorem 2
Lemma 1: level functions
Definition 2: non-negative Lévy processes ken1999levy
Theorem 3: Lévy-Khintchine representation for non-negative Lévy processes. See Sato ken1999levy
Definition 3: Lévy induced level function
proof : Proof of Lemma \ref{['lem:level']}
proof : Proof of \ref{['thm:generic-G-sampler']} ($G$-Sampler)
Remark 1
...and 10 more

Universal Perfect Samplers for Incremental Streams

TL;DR

Abstract

Universal Perfect Samplers for Incremental Streams

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (20)