Table of Contents
Fetching ...

Greedy Poisson Rejection Sampling

Gergely Flamich

TL;DR

Greedy Poisson Rejection Sampling (GPRS) reframes one-shot channel simulation as a greedy search for the first arrival of a Poisson process under a stretched density-ratio graph. The authors prove measure-theoretic correctness, derive runtime and codelength bounds, and show optimal or near-optimal performance for one-dimensional unimodal density-ratio problems, with an overall bound on codelength of $H[X| ext{Pi}]\le I[X;Y]+\log_2(I[X;Y]+1)+6$. They introduce parallel and branch-and-bound variants (PGPRS and GPRS_sac) that exploit problem structure to accelerate sampling and reduce coding costs, significantly outperforming A* coding on targeted tasks. The work provides a versatile, theory-grounded toolkit for efficient one-shot channel simulation and has practical implications for neural data compression and privacy applications, complemented by open-source implementation.

Abstract

One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution $Q$ using a coding distribution $P$ using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sadly, existing solutions are too slow or have limited applicability, preventing widespread adoption. In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. We achieve this by constructing a rejection sampling procedure equivalent to greedily searching over the points of a Poisson process. Hence, we call our algorithm greedy Poisson rejection sampling (GPRS) and analyze the correctness and time complexity of several of its variants. Finally, we empirically verify our theorems, demonstrating that GPRS significantly outperforms the current state-of-the-art method, A* coding. Our code is available at https://github.com/gergely-flamich/greedy-poisson-rejection-sampling.

Greedy Poisson Rejection Sampling

TL;DR

Greedy Poisson Rejection Sampling (GPRS) reframes one-shot channel simulation as a greedy search for the first arrival of a Poisson process under a stretched density-ratio graph. The authors prove measure-theoretic correctness, derive runtime and codelength bounds, and show optimal or near-optimal performance for one-dimensional unimodal density-ratio problems, with an overall bound on codelength of . They introduce parallel and branch-and-bound variants (PGPRS and GPRS_sac) that exploit problem structure to accelerate sampling and reduce coding costs, significantly outperforming A* coding on targeted tasks. The work provides a versatile, theory-grounded toolkit for efficient one-shot channel simulation and has practical implications for neural data compression and privacy applications, complemented by open-source implementation.

Abstract

One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution using a coding distribution using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sadly, existing solutions are too slow or have limited applicability, preventing widespread adoption. In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. We achieve this by constructing a rejection sampling procedure equivalent to greedily searching over the points of a Poisson process. Hence, we call our algorithm greedy Poisson rejection sampling (GPRS) and analyze the correctness and time complexity of several of its variants. Finally, we empirically verify our theorems, demonstrating that GPRS significantly outperforms the current state-of-the-art method, A* coding. Our code is available at https://github.com/gergely-flamich/greedy-poisson-rejection-sampling.
Paper Structure (28 sections, 14 theorems, 167 equations, 5 figures, 7 algorithms)

This paper contains 28 sections, 14 theorems, 167 equations, 5 figures, 7 algorithms.

Key Result

Theorem 3.1

Let $Q$ and $P$ be the target and proposal distributions for alg:global_gprs, respectively, and $r = dQ/dP$ their density ratio. Let $N$ denote the number of samples simulated by the algorithm before it terminates. Then, where $\mathbb{V}[\cdot]$ denotes the variance of a random variable.

Figures (5)

  • Figure 1: Illustration of three GPRS procedures for a Gaussian target $Q = \mathcal{N}(1, 0.25^2)$ and Gaussian proposal distribution $P = \mathcal{N}(0, 1)$, with the time axis truncated to the first $17$ units. All three variants find the first arrival of the same $(1, P)$-Poisson process $\Pi$ under the graph of $\varphi = \sigma \circ r$ indicated by the thick dashed black line in each plot. Here, $r = dQ/dP$ is the target-proposal density ratio, and $\sigma$ is given by \ref{['eq:stretch_fn']}. Left:\ref{['alg:global_gprs']} sequentially searching through the points of $\Pi$. The green circle () shows the first point of $\Pi$ that falls under $\varphi$, and is accepted. All other points are rejected, as indicated by red crosses (✕). In practice, \ref{['alg:global_gprs']} does not simulate points of $\Pi$ that arrive after the accepted arrival. Middle: Parallelized GPRS (\ref{['alg:parallel_gprs']}) searching through two independent $(1/2, P)$-Poisson processes $\Pi_1$ and $\Pi_2$ in parallel. Blue points are arrivals in $\Pi_1$ and orange points are arrivals in $\Pi_2$. Crosses (✕) indicate rejected, and circles () indicate accepted points by each thread. In the end, the algorithm accepts the earliest arrival across all processes, which in this case is marked by the blue circle (). Right: Branch-and-bound GPRS (\ref{['alg:gprs_sac']}), when $\varphi$ is unimodal. The shaded red areas are never searched or simulated by the algorithm since, given the first two rejections, we know points in those regions cannot fall under $\varphi$.
  • Figure 2: Branch-and-bound GPRS with splitting function.
  • Figure : Generating a $(\lambda, P_{X \mid T})$-Poisson process.
  • Figure : Standard rejection sampler.
  • Figure : Parallel GPRS with $J$ available threads.

Theorems & Definitions (21)

  • Theorem 3.1: Expected Runtime
  • Theorem 3.2: Fractional Moments of the Index
  • Theorem 3.3: Expected Codelength
  • Theorem 3.4: Expected runtime of parallelized GPRS
  • Theorem 3.5: Expected codelength of parallelized GPRS
  • Theorem 3.6: Expected Runtime of GPRS with binary search
  • Theorem 3.7: Expected Codelength of GPRS with binary search
  • Lemma B.1
  • proof
  • Lemma D.1
  • ...and 11 more