Table of Contents
Fetching ...

Efficient Online Random Sampling via Randomness Recycling

Thomas L. Draper, Feras A. Saad

TL;DR

This work tackles online random sampling where each X_i is drawn from a potentially evolving distribution P_i using a stream of unbiased coin tosses. The authors introduce randomness recycling, maintaining a global uniform state (Z,M) to reuse randomness across rounds, and prove an entropy-cost bound arbitrarily close to the Shannon limit with only O(log(d/ε)) auxiliary space. The main theorem shows that for any ε>0 and denominator bound d, there exists an online sampler with amortized entropy cost within εn of the information-theoretic lower bound, while keeping memory bounded; this extends to general discrete distributions and random processes P. The paper also develops practical recycling techniques to accelerate a range of samplers (uniform, inversion-based, alias, DDG) and demonstrates tangible speedups for tasks like random permutations and discrete Gaussian sampling, accompanied by a C library. Together, these results significantly advance space- and entropy-efficient online sampling, with broad implications for randomized algorithms and probabilistic programming in resource-constrained settings.

Abstract

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.

Efficient Online Random Sampling via Randomness Recycling

TL;DR

This work tackles online random sampling where each X_i is drawn from a potentially evolving distribution P_i using a stream of unbiased coin tosses. The authors introduce randomness recycling, maintaining a global uniform state (Z,M) to reuse randomness across rounds, and prove an entropy-cost bound arbitrarily close to the Shannon limit with only O(log(d/ε)) auxiliary space. The main theorem shows that for any ε>0 and denominator bound d, there exists an online sampler with amortized entropy cost within εn of the information-theoretic lower bound, while keeping memory bounded; this extends to general discrete distributions and random processes P. The paper also develops practical recycling techniques to accelerate a range of samplers (uniform, inversion-based, alias, DDG) and demonstrates tangible speedups for tasks like random permutations and discrete Gaussian sampling, accompanied by a C library. Together, these results significantly advance space- and entropy-efficient online sampling, with broad implications for randomized algorithms and probabilistic programming in resource-constrained settings.

Abstract

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables , where is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within bits of the information-theoretically optimal Shannon lower bound using space. This result holds both pointwise in terms of the Shannon information content conditioned on and , and in expectation to obtain a rate of bits per sample as (where is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.

Paper Structure

This paper contains 56 sections, 11 theorems, 55 equations, 10 figures, 2 tables, 15 algorithms.

Key Result

Theorem 1.5

For any $\varepsilon > 0$ and $d \ge 1$, there exists an online random sampling algorithm using a sequence $\bm{C}$ of i.i.d. coin tosses such that, for every distribution sequence $\bm{p} \in (\Delta\mathcal{X}_d)^{\mathbb{N}}$, the entropy cost of generating an output sequence $\bm{X} = (X_i \sim where $\lbrace X_1=x_1, \dots, X_n=x_n\rbrace$ is any positive probability event and $W_{d,\varepsi

Figures (10)

  • Figure 1.1: Online random sampling using randomness recycling. The sampling algorithm is dynamically given a random sequence $\bm{P} \coloneqq (P_i)_{i\ge 1}$ of probability distributions and access to i.i.d. coin tosses $\bm{C} \coloneqq (C_i)_{i \ge 1}$, and generates output sequence $\bm{X} = (X_i)_{i\ge1}$ such that $X_i \sim P_i$ for $i \ge 1$.
  • Figure 3.1: Information flow in $\Call{Inversion}{}$ (\ref{['alg:inversion']}) for sampling general distributions.
  • Figure 4.1: Benchmark comparison of entropy consumption and sampling time on a range of distribution sizes $n$, for three uniform samplers: the Fast Dice Roller of lumbroso2013, the method of lemire2019, and our uniform sampler with randomness recycling (\ref{['alg:uniform']}). Random bits are supplied by $256$-byte buffered requests to /dev/random.
  • Figure 4.2: Illustration of \ref{['alg:uniform-lemire']} to generate a uniform with range $n=6$ given a uniform random word $U$ of length $W=4$ bits, including the operations required for randomness recycling.
  • Figure 4.3: Benchmark comparison of entropy consumption and sampling time using various optimized sampling algorithms for discrete uniforms. alg:uniform-wideningalg:uniform-brackett are novel to this work.
  • ...and 5 more figures

Theorems & Definitions (37)

  • Remark 1.1
  • Remark 1.2
  • Remark 1.3
  • Remark 1.4
  • Theorem 1.5
  • Conjecture 1.5
  • Corollary 1.6
  • Remark 1.7
  • Definition 2.1
  • Definition 2.2
  • ...and 27 more