Table of Contents
Fetching ...

Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support

Tim Reichelt, Luke Ong, Tom Rainforth

TL;DR

Probabilistic programs with stochastic support induce posteriors that are a Bayesian Model Averaging (BMA) over path-specific sub-program posteriors, making predictions sensitive to unstable path weights. The authors propose robust path-weighting methods based on stacking and PAC-Bayes, implemented as post-processing steps that reweight local SLP posteriors without re-running inference. They formulate a stacking objective, show how to apply stacking as a post-processing step, and integrate PAC-Bayes regularization to prevent weight collapse, supported by extensive experiments across synthetic and real datasets. The results demonstrate improved predictive performance and robustness to misspecification and inference imperfections, offering a practical, low-cost enhancement for PPS inference engines like Pyro.

Abstract

The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as BMA weights can be unstable due to model misspecification or inference approximations, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.

Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support

TL;DR

Probabilistic programs with stochastic support induce posteriors that are a Bayesian Model Averaging (BMA) over path-specific sub-program posteriors, making predictions sensitive to unstable path weights. The authors propose robust path-weighting methods based on stacking and PAC-Bayes, implemented as post-processing steps that reweight local SLP posteriors without re-running inference. They formulate a stacking objective, show how to apply stacking as a post-processing step, and integrate PAC-Bayes regularization to prevent weight collapse, supported by extensive experiments across synthetic and real datasets. The results demonstrate improved predictive performance and robustness to misspecification and inference imperfections, offering a practical, low-cost enhancement for PPS inference engines like Pyro.

Abstract

The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as BMA weights can be unstable due to model misspecification or inference approximations, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.
Paper Structure (36 sections, 1 theorem, 46 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 46 equations, 18 figures, 4 tables, 1 algorithm.

Key Result

Theorem G.1

For all $q(\theta)$ absolutely continuous with respect to $r(\theta)$, $\tilde{y}_{\ell} \sim p_{\text{true}}(\tilde{y})$ i.i.d., $\beta \in (0, \infty)$, $L, M \in \mathbb{N}$, $p(\tilde{y} \mid \theta) \in (0, \infty)$ for all $\{\theta \in \Theta \mid p_{\text{true}}(\tilde{y}) > 0 \} \times \{ \ and furthermore (unconditionally) where $\widetilde{\mathcal{P}}_{M,L}(q; r, \beta)$ as in Eq. eq:

Figures (18)

  • Figure 1: Behaviour of the BMA and stacked weights in the models as described in Sec. \ref{['sec:misspecification']}.
  • Figure 2: SLP weights for problem in Sec. \ref{['sec:subset_reg']}. Each dot represents the weight of the corresponding SLP in the model. Results are computed over 10 generated datasets.
  • Figure 3: SLP weights for Sec. \ref{['sec:radon']}. X-tick labels indicate the different modelling choices for $\alpha$ and $\beta$; the pattern is "$\alpha$ model choice, $\beta$ model choice" with P = pooling, NP = no pooling, H = hierarchical, and G = group-level predictor.
  • Figure 4: Impact of regularization parameter $\beta$ on predictive performance in the different models (higher is better). Plotted are mean and standard deviation.
  • Figure 5: Example Pyro program with stochastic support.
  • ...and 13 more figures

Theorems & Definitions (1)

  • Theorem G.1: morningstar2022PAC