Table of Contents
Fetching ...

Bridge Sampling Diagnostics

Giorgio Micaletto, Aki Vehtari

Abstract

In Bayesian statistics, the marginal likelihood is used for model selection and averaging, yet it is often challenging to compute accurately for complex models. Approaches such as bridge sampling, while effective, may suffer from issues of high variability of the estimates. We present how to estimate Monte Carlo standard error (MCSE) for bridge sampling, and how to diagnose the reliability of MCSE estimates using Pareto-$\hat{k}$ and block reshuffling diagnostics without the need to repeatedly re-run full posterior inference. We demonstrate the behavior with increasingly more difficult simulated posteriors and many real posteriors from the posteriordb database.

Bridge Sampling Diagnostics

Abstract

In Bayesian statistics, the marginal likelihood is used for model selection and averaging, yet it is often challenging to compute accurately for complex models. Approaches such as bridge sampling, while effective, may suffer from issues of high variability of the estimates. We present how to estimate Monte Carlo standard error (MCSE) for bridge sampling, and how to diagnose the reliability of MCSE estimates using Pareto- and block reshuffling diagnostics without the need to repeatedly re-run full posterior inference. We demonstrate the behavior with increasingly more difficult simulated posteriors and many real posteriors from the posteriordb database.

Paper Structure

This paper contains 13 sections, 26 equations, 6 figures.

Figures (6)

  • Figure 1: Intuition of the bridge sampling estimator. It relies on the ratio of the unnormalized posterior $\widetilde{\pi}(\mathrm{\boldsymbol{\theta}})$ (solid line) and the proposal density $g(\mathrm{\boldsymbol{\theta}})$ (dashed line), and its stability depends on the overlap (shaded region) between the two distributions.
  • Figure 2: Top plot shows the standard deviation of the log marginal likelihood estimate (in log scale) from repeated running of MCMC and bridge sampling (solid line), the standard deviation of log marginal likelihood estimate from repeated running of bridge sampling with block reshuffling (long dashed line), and the MCSE estimate from a single bridge sampling estimate (dashed line). The bottom plot shows the standard deviation (normalized by the median marginal likelihood given the number of covariates) of the marginal likelihood estimate (in log scale) from repeated running of MCMC and bridge sampling (solid line) and the standard deviation of the log marginal likelihood estimate from repeated running of bridge sampling with block reshuffling (long dashed line).
  • Figure 3: Pareto-$\widehat{k}$ diagnostic in the linear regression example with the increasing number of covariates for the distribution of terms in the numerator and denominator (as labeled in the plot). The plot shows the average over 100 independent MCMC runs of estimated $\widehat{k}$ using the default number of tail draws (135) and only 10 tail draws with continuous and long dashed lines, respectively.
  • Figure 4: posteriordb posteriors + Birthdays: Estimated MCSE vs standard deviation of log marginal likelihood estimates from repeated MCMC runs. MCSE estimate has small bias except for a few posteriors with the highest variability.
  • Figure 5: posteriordb posteriors + Birthdays: Standard deviation of log marginal likelihood estimates from block reshuffling vs standard deviation of log marginal likelihood estimates from 100 repeated MCMC runs. Block reshuffling underestimates the small variability estimates, but has small bias in case of high variability.
  • ...and 1 more figures