Table of Contents
Fetching ...

Parallelizable Feynman-Kac Models for Universal Probabilistic Programming

Michele Boreale, Luisa Collodi

Abstract

We study provably correct and efficient instantiations of Sequential Monte Carlo (SMC) inference in the context of formal operational semantics of Probabilistic Programs (PPs). We focus on universal PPs featuring sampling from arbitrary measures and conditioning/reweighting in unbounded loops. We first equip Probabilistic Program Graphs (PPGs), an automata-theoretic description format of PPs, with an expectation-based semantics over infinite execution traces, which also incorporates trace weights. We then prove a finite approximation theorem that provides bounds to this semantics based on expectations taken over finite, fixed-length traces. This enables us to frame our semantics within a Feynman-Kac (FK) model, and ensures the consistency of the Particle Filtering (PF) algorithm, an instance of SMC, with respect to our semantics. Building on these results, we introduce VPF, a vectorized version of the PF algorithm tailored to PPGs and our semantics. Experiments conducted with a proof-of-concept implementation of VPF show very promising results compared to state-of-the-art PP inference tools.

Parallelizable Feynman-Kac Models for Universal Probabilistic Programming

Abstract

We study provably correct and efficient instantiations of Sequential Monte Carlo (SMC) inference in the context of formal operational semantics of Probabilistic Programs (PPs). We focus on universal PPs featuring sampling from arbitrary measures and conditioning/reweighting in unbounded loops. We first equip Probabilistic Program Graphs (PPGs), an automata-theoretic description format of PPs, with an expectation-based semantics over infinite execution traces, which also incorporates trace weights. We then prove a finite approximation theorem that provides bounds to this semantics based on expectations taken over finite, fixed-length traces. This enables us to frame our semantics within a Feynman-Kac (FK) model, and ensures the consistency of the Particle Filtering (PF) algorithm, an instance of SMC, with respect to our semantics. Building on these results, we introduce VPF, a vectorized version of the PF algorithm tailored to PPGs and our semantics. Experiments conducted with a proof-of-concept implementation of VPF show very promising results compared to state-of-the-art PP inference tools.
Paper Structure (28 sections, 14 theorems, 35 equations, 5 figures, 2 algorithms)

This paper contains 28 sections, 14 theorems, 35 equations, 5 figures, 2 algorithms.

Key Result

Theorem 2.1

Let $t\geq 1$ be an integer. Let $\mu^1$ be a probability measure on $\Omega$ and $K_2,...,K_t$ be $t-1$ (not necessarily distinct) Markov kernels from $\Omega$ to $\Omega$. Then there is a unique probability measure $\mu^t$ defined on $(\Omega^t,\mathcal{F}^t)$ such that for every $A_1\times \cdots

Figures (5)

  • Figure 1: Left. The PPG of Example \ref{['ex:simple0']} and a corresponding pseudo-code. The $\text{\rmnil}$ node (2) is distinguished with a double border. Constant 1 predicates and score functions, and the identity function are not displayed in transitions. The score function $\gamma$ decorates node 3, that is $\hbox{$\mathbf{sc}$}(3)=\gamma$. Right. The PPG for the drunk man and mouse random walk of Example \ref{['ex:dm1']} and a corresponding pseudo-code. The score function $\gamma$ decorates node 1, that is $\hbox{$\mathbf{sc}$}(1)=\gamma$.
  • Figure 2: For $N=10^5,10^6$, scatterplots of the log-ratios of execution times, $\log_{10}(\mathrm{time}_{{tool}}/\mathrm{time}_{\mathrm{VPF}})$, based on data points in Table \ref{['tab:table1']}. From left to right: $N=10^5$, $tool=$ CorePPL; $N=10^5$, $tool=$ WebPPL-smc; $N=10^6$, $tool=$ CorePPL; $N=10^6$, $tool=$ WebPPL-smc. For $N=10^6$, the vast majority of the data points lie above the x-axis, indicating superior performance of VPF across different examples. An enlarged version of these plots is reported in Appendix \ref{['app:air']}.
  • Figure 3: Execution times (in seconds) for the RW2 program, as a function of the probability $\lambda$ of conditioning on external data for $N=10^4$ (left), $N=10^5$ (center) and $N=10^6$ (right). webPPL missing from the right-most plot due to time-out. Execution times of VPF are basically insensitive to $\lambda$.
  • Figure 4: Enlarged version of plots in Fig. \ref{['fig:scatterplot']}.
  • Figure :

Theorems & Definitions (42)

  • Definition 2.1: Markov kernel
  • Theorem 2.1: product of measures, Ash,Th.2.6.7
  • Example 1
  • Definition 3.1: PPG
  • Example 2
  • Example 3: Of mice and drunk men
  • Remark 1
  • Definition 3.2: PPG Markov kernel
  • Lemma 3.1
  • Definition 4.1: probability measure induced by $S$
  • ...and 32 more