Universal Perfect Samplers for Incremental Streams
Seth Pettie, Dingyu Wang
TL;DR
This work addresses exact $G$-sampling on incremental streams for a broad class of $G$ corresponding to Laplace exponents of non-negative Lévy processes. It introduces Lévy-Khintchine–based level functions $\ell_G$ to realize perfect sampling, delivering a two-word memory $G$-sampler and a universal $\mathcal{G}$-sampler that uses $O(\log n)$ memory and can produce exact samples for any $G\in\mathcal{G}$ at query time. The approach extends to sampling sequences (with/without replacement) and to sampling edges in graphs via stochastic sampling circuits, with concrete level-function derivations for cases like $G(z)=z^{1/2}$, $G(z)=1-e^{-\tau z}$, and $G(z)=\log(1+z)$. By tying $G$ to Lévy processes, the paper achieves near-optimal space bounds and broad applicability, and it opens avenues for representing other stochastic objects within this framework. Overall, it provides a principled, memory-efficient toolkit for exact streaming sampling grounded in Lévy-Khintchine theory and stochastic circuits.
Abstract
If $G : \mathbb{R}_+ \to \mathbb{R}_+$, the $G$-moment of a vector $\mathbf{x}\in\mathbb{R}_+^n$ is $G(\mathbf{x}) = \sum_{v\in[n]} G(\mathbf{x}(v))$ and the $G$-sampling problem is to select an index $v_*\in [n]$ according to its contribution to the $G$-moment, i.e., such that $\Pr(v_*=v) = G(\mathbf{x}(v))/G(\mathbf{x})$. Approximate $G$-samplers may introduce multiplicative and/or additive errors to this probability, and some have a non-trivial probability of failure. In this paper we focus on the exact $G$-sampling problem, where $G$ is selected from the class $\mathcal{G}$ of Laplace exponents of non-negative, one-dimensional Lévy processes, which includes several well studied classes such as $p$th moments $G(z)=z^p$, $p\in[0,1]$, logarithms $G(z)=\log(1+z)$, Cohen and Geri's soft concave sublinear functions, which are used to approximate concave sublinear functions, including cap statistics. We develop $G$-samplers for a vector $\mathbf{x} \in \mathbb{R}_+^n$ that is presented as an incremental stream of positive updates. In particular: * For any $G\in\mathcal{G}$, we give a very simple $G$-sampler that uses 2 words of memory and stores at all times a $v_*\in [n]$, such that $\Pr(v_*=v)$ is exactly $G(\mathbf{x}(v))/G(\mathbf{x})$. * We give a ``universal'' $\mathcal{G}$-sampler that uses $O(\log n)$ words of memory w.h.p., and given any $G\in \mathcal{G}$ at query time, produces an exact $G$-sample. With an overhead of a factor of $k$, both samplers can be used to $G$-sample a sequence of $k$ indices with or without replacement. Our sampling framework is simple and versatile, and can easily be generalized to sampling from more complex objects like graphs and hypergraphs.
