Table of Contents
Fetching ...

Predictively Oriented Posteriors

Yann McLatchie, Badr-Eddine Cherief-Abdellatif, David T. Frazier, Jeremias Knoblauch

TL;DR

Predictively Oriented Posteriors (PrO) recast Bayesian uncertainty as a property of predictive capability rather than parameter estimation. By optimizing a predictive scoring rule directly and regularising toward a prior over model parameters, PrO posteriors yield a mixing distribution $Q_n$ that forms predictive distributions $P_Q= extstyleig( heta o P_ hetaig)$ with superior calibration under misspecification. The theory shows PrO posteriors dominate classical and generalized Bayes posteriors in predictive risk when the model is misspecified, while retaining Bayes-like performance under well-specified models; they converge to a predictively optimal model average $Q^*$ and admit a stable uncertainty interpretation tied to model misspecification. Computation via Wasserstein gradient flows enables practical sampling from PrO posteriors, and empirical illustrations across binary classification, river flow, and housing data highlight improved predictive performance and richer uncertainty representations. Overall, PrO posteriors offer a principled, adaptive framework balancing interpretability, predictive accuracy, and uncertainty quantification in the presence of model misspecification.

Abstract

We advocate for a new statistical principle that combines the most desirable aspects of both parameter inference and density estimation. This leads us to the predictively oriented (PrO) posterior, which expresses uncertainty as a consequence of predictive ability. Doing so leads to inferences which predictively dominate both classical and generalised Bayes posterior predictive distributions: up to logarithmic factors, PrO posteriors converge to the predictively optimal model average. Whereas classical and generalised Bayes posteriors only achieve this rate if the model can recover the data-generating process, PrO posteriors adapt to the level of model misspecification. This means that they concentrate around the true model in the same way as Bayes and Gibbs posteriors if the model can recover the data-generating distribution, but do not concentrate in the presence of non-trivial forms of model misspecification. Instead, they stabilise towards a predictively optimal posterior whose degree of irreducible uncertainty admits an interpretation as the degree of model misspecification -- a sharp contrast to how Bayesian uncertainty and its existing extensions behave. Lastly, we show that PrO posteriors can be sampled from by evolving particles based on mean field Langevin dynamics, and verify the practical significance of our theoretical developments on a number of numerical examples.

Predictively Oriented Posteriors

TL;DR

Predictively Oriented Posteriors (PrO) recast Bayesian uncertainty as a property of predictive capability rather than parameter estimation. By optimizing a predictive scoring rule directly and regularising toward a prior over model parameters, PrO posteriors yield a mixing distribution that forms predictive distributions with superior calibration under misspecification. The theory shows PrO posteriors dominate classical and generalized Bayes posteriors in predictive risk when the model is misspecified, while retaining Bayes-like performance under well-specified models; they converge to a predictively optimal model average and admit a stable uncertainty interpretation tied to model misspecification. Computation via Wasserstein gradient flows enables practical sampling from PrO posteriors, and empirical illustrations across binary classification, river flow, and housing data highlight improved predictive performance and richer uncertainty representations. Overall, PrO posteriors offer a principled, adaptive framework balancing interpretability, predictive accuracy, and uncertainty quantification in the presence of model misspecification.

Abstract

We advocate for a new statistical principle that combines the most desirable aspects of both parameter inference and density estimation. This leads us to the predictively oriented (PrO) posterior, which expresses uncertainty as a consequence of predictive ability. Doing so leads to inferences which predictively dominate both classical and generalised Bayes posterior predictive distributions: up to logarithmic factors, PrO posteriors converge to the predictively optimal model average. Whereas classical and generalised Bayes posteriors only achieve this rate if the model can recover the data-generating process, PrO posteriors adapt to the level of model misspecification. This means that they concentrate around the true model in the same way as Bayes and Gibbs posteriors if the model can recover the data-generating distribution, but do not concentrate in the presence of non-trivial forms of model misspecification. Instead, they stabilise towards a predictively optimal posterior whose degree of irreducible uncertainty admits an interpretation as the degree of model misspecification -- a sharp contrast to how Bayesian uncertainty and its existing extensions behave. Lastly, we show that PrO posteriors can be sampled from by evolving particles based on mean field Langevin dynamics, and verify the practical significance of our theoretical developments on a number of numerical examples.

Paper Structure

This paper contains 44 sections, 11 theorems, 72 equations, 15 figures.

Key Result

Theorem 1

Assumption ass:convex is satisfied, and Assumption ass:entropy or ass:Global holds. Under prior regularity conditions, the following holds for $n$ sufficiently large:

Figures (15)

  • Figure 1: The Gibbs and pro posterior distributions under different forms of model misspecification. Coefficients $\theta_1, \theta_2$ are shown as black crosses, with $\theta_1 = \theta_2$ under trivial misspecification.
  • Figure 2: Mean and standard error of Gibbs posterior's predictive loss relative to pro posteriors evaluated using negative log likelihood (NLL), squared MMD ($\operatorname{MMD}^2$), and Continuous Ranked Probability Score (CRPS) on a test set for different forms of misspecification.
  • Figure 3: Illustration of key ideas introduced in Definition \ref{['def:misspecification']}. Panel (a) depicts trivial model misspecification: relative to the chosen score $\mathcal{S}$, a singular element of $\mathcal{M}_{\Theta}$ provides the best fit for $P_0$, and there is no benefit to averaging. In contrast, Panel (b) illustrates the case of non-trivial model misspecification: constructing ${Q^\star} := \operatornamewithlimits{argmin}_{Q}\mathcal{S}(P_Q, P_0)$ as a convex combination of $P_{\theta_1}$ and $P_{\theta_2}$ yields a better predictive for $P_0$ than any one point in $\mathcal{M}_{\Theta}$. Panel (c) provides pictorial intuition for convex recoverability: it is the special case where $P_{Q^{\star}}$ and $P_0$ coincide.
  • Figure 4: Palmer penguins example. We plot both the kernel density estimate (KDE) and the pro posterior predictive overlaid with the data (left, middle), as well as the pro posterior overlaid with the average measurements for each of the three penguin species in the data set (right). For inference with the pro posterior, we use a bivariate Gaussian model. The results show that the pro posterior not only correctly identifies the three penguin species in the data (right), but also provides a predictive distribution which can fit the observed data as well as a kernel density estimator. For further details, see Appendix \ref{['sec:penguins-supp']}.
  • Figure 5: Golf putting data. Points represent the proportion of successful putts made by professional golfers as a function of the distance from the hole. The first row shows the predictive mean and one standard deviation intervals of the Bayes, Martingale, and pro posteriors. The second row shows the marginals of the corresponding parameter posteriors.
  • ...and 10 more figures

Theorems & Definitions (22)

  • Definition 1: Scoring Rule
  • Definition 2
  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 2
  • Corollary 3
  • ...and 12 more