Predictive performance of power posteriors
Yann McLatchie, Edwin Fong, David T. Frazier, Jeremias Knoblauch
TL;DR
Predictive performance of power posteriors investigates whether tempering the likelihood via the temperature $\tau$ improves posterior predictions. The authors prove that, under mild concentration conditions, the predictive distribution $p_n^{(\tau)}(\cdot\mid y_{1:n})$ converges to the plug-in predictive and is asymptotically independent of $\tau$ in moderate-to-large samples, though small-sample gains can occur. They derive uniform TV and KL bounds, discuss cross-validation for selecting $\tau$, and illustrate with normal-location, beta-binomial, and misspecified regression examples. The results emphasize that predictive performance is driven by data and model mis-specification rather than parameter uncertainty, and tempering provides limited large-sample benefit with several caveats for finite samples and generalised Bayes formulations. They also connect these insights to calibration issues and outline avenues for extending the theory to coarsened/posterior and hierarchical models, including Bayesian neural networks.
Abstract
We analyse the impact of using tempered likelihoods in the production of posterior predictions. While the choice of temperature has an impact on predictive performance in small samples, we formally show that in moderate-to-large samples, tempering does not impact posterior predictions.
