Table of Contents
Fetching ...

Instance Dependent Testing of Samplers using Interval Conditioning

Rishiraj Bhattacharyya, Sourav Chakraborty, Yash Pote, Uddalok Sarkar, Sayantan Sen

TL;DR

The paper tackles the problem of verifying samplers that may output from infinite or discrete domains by introducing instance-dependent testers under interval conditioning. It develops a framework that leverages a convolution with a Triangular distribution to simulate continuous interval conditioning and applies the Tootsie Pop Algorithm to estimate probability masses, yielding toltest and ERtoltest for distance testing. The authors instantiate these ideas in Lachesis, a practical tester for inverse transform samplers, and demonstrate significant empirical speedups (up to 1000x) over prior worst-case approaches. The work advances sampler verification by delivering instance-aware guarantees and scalable testing for both discrete and continuous-like samplers, with broad potential impact on reliability and transparency in probabilistic AI systems.

Abstract

Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik, Teq, Flash, CubeProbe for testing of different kinds of samplers were proposed only in the last few years. All these testers focus on the worst-case efficiency, and do not support verification of samplers over infinite domains, a case occurring frequently in Astronomy, Finance, Network Security, etc. In this work, we design the first tester of samplers with instance-dependent efficiency, allowing us to test samplers over natural numbers. Our tests are developed via a novel distance estimation algorithm between an unknown and a known probability distribution using an interval conditioning framework. The core technical contribution is a new connection with probability mass estimation of a continuous distribution. The practical gains are also substantial: our experiments establish up to 1000x speedup over state-of-the-art testers.

Instance Dependent Testing of Samplers using Interval Conditioning

TL;DR

The paper tackles the problem of verifying samplers that may output from infinite or discrete domains by introducing instance-dependent testers under interval conditioning. It develops a framework that leverages a convolution with a Triangular distribution to simulate continuous interval conditioning and applies the Tootsie Pop Algorithm to estimate probability masses, yielding toltest and ERtoltest for distance testing. The authors instantiate these ideas in Lachesis, a practical tester for inverse transform samplers, and demonstrate significant empirical speedups (up to 1000x) over prior worst-case approaches. The work advances sampler verification by delivering instance-aware guarantees and scalable testing for both discrete and continuous-like samplers, with broad potential impact on reliability and transparency in probabilistic AI systems.

Abstract

Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik, Teq, Flash, CubeProbe for testing of different kinds of samplers were proposed only in the last few years. All these testers focus on the worst-case efficiency, and do not support verification of samplers over infinite domains, a case occurring frequently in Astronomy, Finance, Network Security, etc. In this work, we design the first tester of samplers with instance-dependent efficiency, allowing us to test samplers over natural numbers. Our tests are developed via a novel distance estimation algorithm between an unknown and a known probability distribution using an interval conditioning framework. The core technical contribution is a new connection with probability mass estimation of a continuous distribution. The practical gains are also substantial: our experiments establish up to 1000x speedup over state-of-the-art testers.

Paper Structure

This paper contains 38 sections, 28 theorems, 84 equations, 8 figures, 3 tables, 6 algorithms.

Key Result

Theorem 4

Let $P\xspace$ be an unknown distribution and $Q\xspace$ a known distribution over $\mathbb{Z}$. Given access to $\mathsf{ICOND}\xspace(P\xspace)$, accuracy parameters $\varepsilon, \eta \in (0,1)$ with $\eta > \varepsilon$, and confidence parameter $\delta \in (0,1)$, the algorithms $\mathsf{toltes

Figures (8)

  • Figure 1: An inverse transform sampler for the geometric distribution.
  • Figure 2: $\mathsf{ICOND}$ implementation for a standard geometric sampler. Variables introduced specifically to support interval conditioning are highlighted.
  • Figure 3: Performance comparison of $\mathsf{Lachesis}$ and $\mathsf{CubeProbe}$ on Binomial samplers. The y-axis represents the number of samples drawn from the sampler, while the x-axis shows the Domain size of the distribution.
  • Figure 4: Binomial and Poisson sampler implemented in GSL and NumPy.
  • Figure 5: Flawed implementations of the Binomial sampler (Critical flaws are highlighted).
  • ...and 3 more figures

Theorems & Definitions (90)

  • Definition 1: $\ell_{\infty}$ and $\mathsf{d_{TV}}$-distance
  • Definition 2: $\mathsf{tilt}$
  • Definition 3: Interval Conditioning
  • Theorem 4
  • Corollary 4
  • Theorem 5: Correctness of $\mathsf{ICOND}\xspace^\mathsf{Cont}$
  • Lemma 5: Correctness of
  • Claim 5: Implementation
  • Definition 6: Uniform distribution
  • Definition 7: Poisson random variable
  • ...and 80 more