Table of Contents
Fetching ...

Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops

Fabian Zaiser, Andrzej S. Murawski, C. -H. Luke Ong

TL;DR

The paper presents two automated methods for bounding posterior distributions of discrete probabilistic programs with loops and conditioning: residual mass semantics, which yields simple, fast lower and upper bounds by loop unrolling, and geometric bound semantics, which leverages contraction invariants and eventually geometric distributions (EGDs) to obtain exponential tail bounds and moment bounds. The core novelty lies in the contraction invariants and the EGD-based semantics, which enable bounding not just probabilities but also moments and tails, with a decidable constraint system to synthesize invariants. Both approaches are proven sound and convergent as loop unrolling increases, and they are fully automated in the Diabolo tool, which evaluates a wide range of benchmarks and compares favorably to prior methods like GuBPI and Polar. The results demonstrate practical applicability, showing tight bounds on many benchmarks and providing automatic, rigorous checks for approximate inference in probabilistic programs. The work advances guaranteed bounds as a scalable alternative to exact inference and sampling, with meaningful theoretical guarantees and real-world utility.

Abstract

We study the problem of bounding the posterior distribution of discrete probabilistic programs with unbounded support, loops, and conditioning. Loops pose the main difficulty in this setting: even if exact Bayesian inference is possible, the state of the art requires user-provided loop invariant templates. By contrast, we aim to find guaranteed bounds, which sandwich the true distribution. They are fully automated, applicable to more programs and provide more provable guarantees than approximate sampling-based inference. Since lower bounds can be obtained by unrolling loops, the main challenge is upper bounds, and we attack it in two ways. The first is called residual mass semantics, which is a flat bound based on the residual probability mass of a loop. The approach is simple, efficient, and has provable guarantees. The main novelty of our work is the second approach, called geometric bound semantics. It operates on a novel family of distributions, called eventually geometric distributions (EGDs), and can bound the distribution of loops with a new form of loop invariants called contraction invariants. The invariant synthesis problem reduces to a system of polynomial inequality constraints, which is a decidable problem with automated solvers. If a solution exists, it yields an exponentially decreasing bound on the whole distribution, and can therefore bound moments and tail asymptotics as well, not just probabilities as in the first approach. Both semantics enjoy desirable theoretical properties. In particular, we prove soundness and convergence, i.e. the bounds converge to the exact posterior as loops are unrolled further. On the practical side, we describe Diabolo, a fully-automated implementation of both semantics, and evaluate them on a variety of benchmarks from the literature, demonstrating their general applicability and the utility of the resulting bounds.

Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops

TL;DR

The paper presents two automated methods for bounding posterior distributions of discrete probabilistic programs with loops and conditioning: residual mass semantics, which yields simple, fast lower and upper bounds by loop unrolling, and geometric bound semantics, which leverages contraction invariants and eventually geometric distributions (EGDs) to obtain exponential tail bounds and moment bounds. The core novelty lies in the contraction invariants and the EGD-based semantics, which enable bounding not just probabilities but also moments and tails, with a decidable constraint system to synthesize invariants. Both approaches are proven sound and convergent as loop unrolling increases, and they are fully automated in the Diabolo tool, which evaluates a wide range of benchmarks and compares favorably to prior methods like GuBPI and Polar. The results demonstrate practical applicability, showing tight bounds on many benchmarks and providing automatic, rigorous checks for approximate inference in probabilistic programs. The work advances guaranteed bounds as a scalable alternative to exact inference and sampling, with meaningful theoretical guarantees and real-world utility.

Abstract

We study the problem of bounding the posterior distribution of discrete probabilistic programs with unbounded support, loops, and conditioning. Loops pose the main difficulty in this setting: even if exact Bayesian inference is possible, the state of the art requires user-provided loop invariant templates. By contrast, we aim to find guaranteed bounds, which sandwich the true distribution. They are fully automated, applicable to more programs and provide more provable guarantees than approximate sampling-based inference. Since lower bounds can be obtained by unrolling loops, the main challenge is upper bounds, and we attack it in two ways. The first is called residual mass semantics, which is a flat bound based on the residual probability mass of a loop. The approach is simple, efficient, and has provable guarantees. The main novelty of our work is the second approach, called geometric bound semantics. It operates on a novel family of distributions, called eventually geometric distributions (EGDs), and can bound the distribution of loops with a new form of loop invariants called contraction invariants. The invariant synthesis problem reduces to a system of polynomial inequality constraints, which is a decidable problem with automated solvers. If a solution exists, it yields an exponentially decreasing bound on the whole distribution, and can therefore bound moments and tail asymptotics as well, not just probabilities as in the first approach. Both semantics enjoy desirable theoretical properties. In particular, we prove soundness and convergence, i.e. the bounds converge to the exact posterior as loops are unrolled further. On the practical side, we describe Diabolo, a fully-automated implementation of both semantics, and evaluate them on a variety of benchmarks from the literature, demonstrating their general applicability and the utility of the resulting bounds.

Paper Structure

This paper contains 95 sections, 33 theorems, 68 equations, 8 figures, 6 tables.

Key Result

Lemma 2.2

For any program $P$, the transformation $\llbracket P \rrbracket_{\lightning}$ is linear and $\omega$-continuous, so in particular monotonic. Also, the total measure does not increase: $\llbracket P \rrbracket_{\lightning}(\mu)(\mathbb N^n_{\lightning}) \le \mu(\mathbb N^n_{\lightning})$ for all $\m

Figures (8)

  • Figure 1: A probabilistic program with a loop and conditioning
  • Figure 2: Histogram of samples from two inference algorithms (importance sampling and Pyro's HMC), and the guaranteed bounds from BeutnerOZ22. The bounds show that Pyro's HMC produces wrong results. (Source: BeutnerOZ22)
  • Figure 3: Standard semantics for probabilistic programs
  • Figure 4: Geometric bound semantics for events
  • Figure 5: Geometric bound semantics for statements
  • ...and 3 more figures

Theorems & Definitions (48)

  • Example 1.1
  • Definition 2.1
  • Lemma 2.2
  • Lemma 3.0
  • Theorem 3.1: Soundness of lower bounds
  • Theorem 3.2: Convergence of lower bounds
  • Theorem 3.3: Soundness of the residual mass semantics
  • Theorem 3.4: Convergence of the residual mass semantics
  • Example 3.5: Residual mass semantics
  • Example 4.1
  • ...and 38 more