Predictive performance of power posteriors

Yann McLatchie; Edwin Fong; David T. Frazier; Jeremias Knoblauch

Predictive performance of power posteriors

Yann McLatchie, Edwin Fong, David T. Frazier, Jeremias Knoblauch

TL;DR

Predictive performance of power posteriors investigates whether tempering the likelihood via the temperature $\tau$ improves posterior predictions. The authors prove that, under mild concentration conditions, the predictive distribution $p_n^{(\tau)}(\cdot\mid y_{1:n})$ converges to the plug-in predictive and is asymptotically independent of $\tau$ in moderate-to-large samples, though small-sample gains can occur. They derive uniform TV and KL bounds, discuss cross-validation for selecting $\tau$, and illustrate with normal-location, beta-binomial, and misspecified regression examples. The results emphasize that predictive performance is driven by data and model mis-specification rather than parameter uncertainty, and tempering provides limited large-sample benefit with several caveats for finite samples and generalised Bayes formulations. They also connect these insights to calibration issues and outline avenues for extending the theory to coarsened/posterior and hierarchical models, including Bayesian neural networks.

Abstract

We analyse the impact of using tempered likelihoods in the production of posterior predictions. While the choice of temperature has an impact on predictive performance in small samples, we formally show that in moderate-to-large samples, tempering does not impact posterior predictions.

Predictive performance of power posteriors

TL;DR

Predictive performance of power posteriors investigates whether tempering the likelihood via the temperature

improves posterior predictions. The authors prove that, under mild concentration conditions, the predictive distribution

converges to the plug-in predictive and is asymptotically independent of

in moderate-to-large samples, though small-sample gains can occur. They derive uniform TV and KL bounds, discuss cross-validation for selecting

, and illustrate with normal-location, beta-binomial, and misspecified regression examples. The results emphasize that predictive performance is driven by data and model mis-specification rather than parameter uncertainty, and tempering provides limited large-sample benefit with several caveats for finite samples and generalised Bayes formulations. They also connect these insights to calibration issues and outline avenues for extending the theory to coarsened/posterior and hierarchical models, including Bayesian neural networks.

Abstract

Paper Structure (21 sections, 5 theorems, 31 equations, 10 figures)

This paper contains 21 sections, 5 theorems, 31 equations, 10 figures.

Introduction
A predictive view on power posteriors
Choosing the temperature predictively
Normal location example
The temperature is eventually inconsequential to predictive accuracy
Technical results
Interpretation
Applicability to generalised Bayes
Cross-validation and the Kullback-Leibler divergence
Additional numerical experiments
Discussion
Technical results
Notation
Main results
Additional results
...and 6 more sections

Key Result

Lemma 1

Under ass:lipzass:concentration, for any $0<\underline{\tau}< \overline{\tau} < \infty$ and $\tau\in[\underline{\tau}, \overline{\tau}]$, with $\mathbb{P}$-probability at least $1-2\max\left\{\varepsilon_n+\exp(-Cn\tau\varepsilon_n^2/M_{\varepsilon_n}), \nu_n\right\}$.

Figures (10)

Figure 1: $\surd{n}$-scaled total variation between the power posterior predictive $p_n^{(\tau)}(\cdot \mid y_{1:n})$ of a normal location model and the true predictive $q^{\star}_n(\cdot\mid y_{1:n})$. Grey curves correspond to individual dataset replicates, dotted black lines to $5\%$ and $95\%$ quantiles, and solid black curves to expectation.
Figure 2: Histograms of $\textsc{elpd}(\tau_{\mathrm{CV}}^\star)$ and $\tau_{\mathrm{CV}}^\star$ in a normal location model with standard normal prior.
Figure 3: Total variation between the true predictive $q^{\star}_n(\cdot\mid y_{1:n})$ and the power posterior predictive $p_n^{(\tau)}(\cdot\mid y_{1:n})$ in (a) the beta-binomial experiment and (b) a linear regression experiment. Grey curves correspond to individual dataset replicates, and dotted lines to $5\%$ and $95\%$ quantiles.
Figure B.1: Normal location example. The grey curves correspond to individual dataset replicates, dotted black lines to $5\%$ and $95\%$ quantiles, and solid black curves to expectation.
Figure B.2: Normal location example. Lines correspond to the scaled risk of \ref{['eq:normal-location-risk']} across different values of $\tau$.
...and 5 more figures

Theorems & Definitions (5)

Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5

Predictive performance of power posteriors

TL;DR

Abstract

Predictive performance of power posteriors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (5)