Table of Contents
Fetching ...

Predictive performance of power posteriors

Yann McLatchie, Edwin Fong, David T. Frazier, Jeremias Knoblauch

TL;DR

Predictive performance of power posteriors investigates whether tempering the likelihood via the temperature $\tau$ improves posterior predictions. The authors prove that, under mild concentration conditions, the predictive distribution $p_n^{(\tau)}(\cdot\mid y_{1:n})$ converges to the plug-in predictive and is asymptotically independent of $\tau$ in moderate-to-large samples, though small-sample gains can occur. They derive uniform TV and KL bounds, discuss cross-validation for selecting $\tau$, and illustrate with normal-location, beta-binomial, and misspecified regression examples. The results emphasize that predictive performance is driven by data and model mis-specification rather than parameter uncertainty, and tempering provides limited large-sample benefit with several caveats for finite samples and generalised Bayes formulations. They also connect these insights to calibration issues and outline avenues for extending the theory to coarsened/posterior and hierarchical models, including Bayesian neural networks.

Abstract

We analyse the impact of using tempered likelihoods in the production of posterior predictions. While the choice of temperature has an impact on predictive performance in small samples, we formally show that in moderate-to-large samples, tempering does not impact posterior predictions.

Predictive performance of power posteriors

TL;DR

Predictive performance of power posteriors investigates whether tempering the likelihood via the temperature improves posterior predictions. The authors prove that, under mild concentration conditions, the predictive distribution converges to the plug-in predictive and is asymptotically independent of in moderate-to-large samples, though small-sample gains can occur. They derive uniform TV and KL bounds, discuss cross-validation for selecting , and illustrate with normal-location, beta-binomial, and misspecified regression examples. The results emphasize that predictive performance is driven by data and model mis-specification rather than parameter uncertainty, and tempering provides limited large-sample benefit with several caveats for finite samples and generalised Bayes formulations. They also connect these insights to calibration issues and outline avenues for extending the theory to coarsened/posterior and hierarchical models, including Bayesian neural networks.

Abstract

We analyse the impact of using tempered likelihoods in the production of posterior predictions. While the choice of temperature has an impact on predictive performance in small samples, we formally show that in moderate-to-large samples, tempering does not impact posterior predictions.
Paper Structure (21 sections, 5 theorems, 31 equations, 10 figures)

This paper contains 21 sections, 5 theorems, 31 equations, 10 figures.

Key Result

Lemma 1

Under ass:lipzass:concentration, for any $0<\underline{\tau}< \overline{\tau} < \infty$ and $\tau\in[\underline{\tau}, \overline{\tau}]$, with $\mathbb{P}$-probability at least $1-2\max\left\{\varepsilon_n+\exp(-Cn\tau\varepsilon_n^2/M_{\varepsilon_n}), \nu_n\right\}$.

Figures (10)

  • Figure 1: $\surd{n}$-scaled total variation between the power posterior predictive $p_n^{(\tau)}(\cdot \mid y_{1:n})$ of a normal location model and the true predictive $q^{\star}_n(\cdot\mid y_{1:n})$. Grey curves correspond to individual dataset replicates, dotted black lines to $5\%$ and $95\%$ quantiles, and solid black curves to expectation.
  • Figure 2: Histograms of $\textsc{elpd}(\tau_{\mathrm{CV}}^\star)$ and $\tau_{\mathrm{CV}}^\star$ in a normal location model with standard normal prior.
  • Figure 3: Total variation between the true predictive $q^{\star}_n(\cdot\mid y_{1:n})$ and the power posterior predictive $p_n^{(\tau)}(\cdot\mid y_{1:n})$ in (a) the beta-binomial experiment and (b) a linear regression experiment. Grey curves correspond to individual dataset replicates, and dotted lines to $5\%$ and $95\%$ quantiles.
  • Figure B.1: Normal location example. The grey curves correspond to individual dataset replicates, dotted black lines to $5\%$ and $95\%$ quantiles, and solid black curves to expectation.
  • Figure B.2: Normal location example. Lines correspond to the scaled risk of \ref{['eq:normal-location-risk']} across different values of $\tau$.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5