Greedy Poisson Rejection Sampling

Gergely Flamich

Greedy Poisson Rejection Sampling

Gergely Flamich

TL;DR

Greedy Poisson Rejection Sampling (GPRS) reframes one-shot channel simulation as a greedy search for the first arrival of a Poisson process under a stretched density-ratio graph. The authors prove measure-theoretic correctness, derive runtime and codelength bounds, and show optimal or near-optimal performance for one-dimensional unimodal density-ratio problems, with an overall bound on codelength of $H[X| ext{Pi}]\le I[X;Y]+\log_2(I[X;Y]+1)+6$. They introduce parallel and branch-and-bound variants (PGPRS and GPRS_sac) that exploit problem structure to accelerate sampling and reduce coding costs, significantly outperforming A* coding on targeted tasks. The work provides a versatile, theory-grounded toolkit for efficient one-shot channel simulation and has practical implications for neural data compression and privacy applications, complemented by open-source implementation.

Abstract

One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution $Q$ using a coding distribution $P$ using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sadly, existing solutions are too slow or have limited applicability, preventing widespread adoption. In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. We achieve this by constructing a rejection sampling procedure equivalent to greedily searching over the points of a Poisson process. Hence, we call our algorithm greedy Poisson rejection sampling (GPRS) and analyze the correctness and time complexity of several of its variants. Finally, we empirically verify our theorems, demonstrating that GPRS significantly outperforms the current state-of-the-art method, A* coding. Our code is available at https://github.com/gergely-flamich/greedy-poisson-rejection-sampling.

Greedy Poisson Rejection Sampling

TL;DR

. They introduce parallel and branch-and-bound variants (PGPRS and GPRS_sac) that exploit problem structure to accelerate sampling and reduce coding costs, significantly outperforming A* coding on targeted tasks. The work provides a versatile, theory-grounded toolkit for efficient one-shot channel simulation and has practical implications for neural data compression and privacy applications, complemented by open-source implementation.

Abstract

One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution

using a coding distribution

using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sadly, existing solutions are too slow or have limited applicability, preventing widespread adoption. In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. We achieve this by constructing a rejection sampling procedure equivalent to greedily searching over the points of a Poisson process. Hence, we call our algorithm greedy Poisson rejection sampling (GPRS) and analyze the correctness and time complexity of several of its variants. Finally, we empirically verify our theorems, demonstrating that GPRS significantly outperforms the current state-of-the-art method, A* coding. Our code is available at https://github.com/gergely-flamich/greedy-poisson-rejection-sampling.

Paper Structure (28 sections, 14 theorems, 167 equations, 5 figures, 7 algorithms)

This paper contains 28 sections, 14 theorems, 167 equations, 5 figures, 7 algorithms.

Introduction
Background
Poisson Processes
Channel Simulation
Greedy Poisson Rejection Sampling
Speeding up the greedy search
Experiments
Related Work
Discussion and Future Work
Acknowledgements
Measure-theoretic Construction of Greedy Poisson Rejection Sampling
Analysis of Greedy Poisson Rejection Sampling
The Expected First Arrival Time
The Expectation and Variance of the Runtime
The Fractional Moments of the Index
...and 13 more sections

Key Result

Theorem 3.1

Let $Q$ and $P$ be the target and proposal distributions for alg:global_gprs, respectively, and $r = dQ/dP$ their density ratio. Let $N$ denote the number of samples simulated by the algorithm before it terminates. Then, where $\mathbb{V}[\cdot]$ denotes the variance of a random variable.

Figures (5)

Figure 1: Illustration of three GPRS procedures for a Gaussian target $Q = \mathcal{N}(1, 0.25^2)$ and Gaussian proposal distribution $P = \mathcal{N}(0, 1)$, with the time axis truncated to the first $17$ units. All three variants find the first arrival of the same $(1, P)$-Poisson process $\Pi$ under the graph of $\varphi = \sigma \circ r$ indicated by the thick dashed black line in each plot. Here, $r = dQ/dP$ is the target-proposal density ratio, and $\sigma$ is given by \ref{['eq:stretch_fn']}. Left:\ref{['alg:global_gprs']} sequentially searching through the points of $\Pi$. The green circle () shows the first point of $\Pi$ that falls under $\varphi$, and is accepted. All other points are rejected, as indicated by red crosses (✕). In practice, \ref{['alg:global_gprs']} does not simulate points of $\Pi$ that arrive after the accepted arrival. Middle: Parallelized GPRS (\ref{['alg:parallel_gprs']}) searching through two independent $(1/2, P)$-Poisson processes $\Pi_1$ and $\Pi_2$ in parallel. Blue points are arrivals in $\Pi_1$ and orange points are arrivals in $\Pi_2$. Crosses (✕) indicate rejected, and circles () indicate accepted points by each thread. In the end, the algorithm accepts the earliest arrival across all processes, which in this case is marked by the blue circle (). Right: Branch-and-bound GPRS (\ref{['alg:gprs_sac']}), when $\varphi$ is unimodal. The shaded red areas are never searched or simulated by the algorithm since, given the first two rejections, we know points in those regions cannot fall under $\varphi$.
Figure 2: Branch-and-bound GPRS with splitting function.
Figure : Generating a $(\lambda, P_{X \mid T})$-Poisson process.
Figure : Standard rejection sampler.
Figure : Parallel GPRS with $J$ available threads.

Theorems & Definitions (21)

Theorem 3.1: Expected Runtime
Theorem 3.2: Fractional Moments of the Index
Theorem 3.3: Expected Codelength
Theorem 3.4: Expected runtime of parallelized GPRS
Theorem 3.5: Expected codelength of parallelized GPRS
Theorem 3.6: Expected Runtime of GPRS with binary search
Theorem 3.7: Expected Codelength of GPRS with binary search
Lemma B.1
proof
Lemma D.1
...and 11 more

Greedy Poisson Rejection Sampling

TL;DR

Abstract

Greedy Poisson Rejection Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (21)