Efficient Online Random Sampling via Randomness Recycling
Thomas L. Draper, Feras A. Saad
TL;DR
This work tackles online random sampling where each X_i is drawn from a potentially evolving distribution P_i using a stream of unbiased coin tosses. The authors introduce randomness recycling, maintaining a global uniform state (Z,M) to reuse randomness across rounds, and prove an entropy-cost bound arbitrarily close to the Shannon limit with only O(log(d/ε)) auxiliary space. The main theorem shows that for any ε>0 and denominator bound d, there exists an online sampler with amortized entropy cost within εn of the information-theoretic lower bound, while keeping memory bounded; this extends to general discrete distributions and random processes P. The paper also develops practical recycling techniques to accelerate a range of samplers (uniform, inversion-based, alias, DDG) and demonstrates tangible speedups for tasks like random permutations and discrete Gaussian sampling, accompanied by a C library. Together, these results significantly advance space- and entropy-efficient online sampling, with broad implications for randomized algorithms and probabilistic programming in resource-constrained settings.
Abstract
This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.
