Optimising Density Computations in Probabilistic Programs via Automatic Loop Vectorisation
Sangho Lim, Hyoungjin Lim, Wonyeol Lee, Xavier Rival, Hongseok Yang
TL;DR
This work tackles the high cost of probabilistic inference in PPLs by introducing an automatic loop vectorisation technique that uses speculative parallel execution coupled with a fixed-point correctness check to preserve semantics. It formalises the approach as a translation from a score-computing language to a vectorised target language with lifted types and antichain-based state representations, and proves soundness of the translation. The authors implement the method on Pyro/PyTorch and demonstrate substantial performance improvements across diverse models and inference schemes (SVI, MAP, MCMC), with speedups up to $1.1$–$6\times$ and notable memory reductions in many cases. While the method accelerates density/gradient computations, it does not address sample generation costs and incurs fixed-point overhead, suggesting a potential hybrid approach for dependence-heavy loops. Overall, the paper contributes a formal vectorisation framework, a correctness proof, and a practical Pyro-based implementation that extends automatic tensor-based acceleration to complex, nested, and data-dependent loops in probabilistic programs.
Abstract
Probabilistic programming languages (PPLs) are a popular tool for high-level modelling across many fields. They provide a range of algorithms for probabilistic inference, which analyse models by learning their parameters from a dataset or estimating their posterior distributions. However, probabilistic inference is known to be very costly. One of the bottlenecks of probabilistic inference stems from the iteration over entries of a large dataset or a long series of random samples. Vectorisation can mitigate this cost, but manual vectorisation is error-prone, and existing automatic techniques are often ad-hoc and limited, unable to handle general repetition structures, such as nested loops and loops with data-dependent control flow, without significant user intervention. To address this bottleneck, we propose a sound and effective method for automatically vectorising loops in probabilistic programs. Our method achieves high throughput using speculative parallel execution of loop iterations, while preserving the semantics of the original loop through a fixed-point check. We formalise our method as a translation from an imperative PPL into a lower-level target language with primitives geared towards vectorisation. We implemented our method for the Pyro PPL and evaluated it on a range of probabilistic models. Our experiments show significant performance gains against an existing vectorisation baseline, achieving $1.1$--$6\times$ speedups and reducing GPU memory usage in many cases. Unlike the baseline, which is limited to a subset of models, our method effectively handled all the tested models.
