Predictively Oriented Posteriors
Yann McLatchie, Badr-Eddine Cherief-Abdellatif, David T. Frazier, Jeremias Knoblauch
TL;DR
Predictively Oriented Posteriors (PrO) recast Bayesian uncertainty as a property of predictive capability rather than parameter estimation. By optimizing a predictive scoring rule directly and regularising toward a prior over model parameters, PrO posteriors yield a mixing distribution $Q_n$ that forms predictive distributions $P_Q= extstyleig( heta o P_ hetaig)$ with superior calibration under misspecification. The theory shows PrO posteriors dominate classical and generalized Bayes posteriors in predictive risk when the model is misspecified, while retaining Bayes-like performance under well-specified models; they converge to a predictively optimal model average $Q^*$ and admit a stable uncertainty interpretation tied to model misspecification. Computation via Wasserstein gradient flows enables practical sampling from PrO posteriors, and empirical illustrations across binary classification, river flow, and housing data highlight improved predictive performance and richer uncertainty representations. Overall, PrO posteriors offer a principled, adaptive framework balancing interpretability, predictive accuracy, and uncertainty quantification in the presence of model misspecification.
Abstract
We advocate for a new statistical principle that combines the most desirable aspects of both parameter inference and density estimation. This leads us to the predictively oriented (PrO) posterior, which expresses uncertainty as a consequence of predictive ability. Doing so leads to inferences which predictively dominate both classical and generalised Bayes posterior predictive distributions: up to logarithmic factors, PrO posteriors converge to the predictively optimal model average. Whereas classical and generalised Bayes posteriors only achieve this rate if the model can recover the data-generating process, PrO posteriors adapt to the level of model misspecification. This means that they concentrate around the true model in the same way as Bayes and Gibbs posteriors if the model can recover the data-generating distribution, but do not concentrate in the presence of non-trivial forms of model misspecification. Instead, they stabilise towards a predictively optimal posterior whose degree of irreducible uncertainty admits an interpretation as the degree of model misspecification -- a sharp contrast to how Bayesian uncertainty and its existing extensions behave. Lastly, we show that PrO posteriors can be sampled from by evolving particles based on mean field Langevin dynamics, and verify the practical significance of our theoretical developments on a number of numerical examples.
