Marginal Likelihoods from Monte Carlo Markov Chains
Alan Heavens, Yabebal Fantaye, Arrykrishna Mootoovaloo, Hans Eggers, Zafiirah Hosenie, Steve Kroon, Elena Sellentin
TL;DR
The paper addresses the challenge of computing the marginal likelihood $E$ (Bayesian evidence) from MCMC samples in high-dimensional spaces. It introduces a Bayesian density-estimation approach that uses the $k$-th nearest-neighbour distances in a Mahalanobis-space to infer the unknown constant $E$ relating chain density to the unnormalised posterior, with extensions to importance-sampled chains via weights $w_eta$. The authors derive explicit expressions for the posterior of $E$, including the MAP estimator $E_{ m MAP}$ and its fractional uncertainty; they show that using $k=1$ and applying pre-whitening (Mahalanobis distance) yields robust performance up to about 10–20 dimensions for chains of length around $10^5$. This method enables extraction of Bayesian evidence directly from standard MCMC outputs, facilitating model comparison without separate marginal-likelihood computations, and is accompanied by open-source code on GitHub for practical use in fields such as cosmology.
Abstract
In this paper, we present a method for computing the marginal likelihood, also known as the model likelihood or Bayesian evidence, from Markov Chain Monte Carlo (MCMC), or other sampled posterior distributions. In order to do this, one needs to be able to estimate the density of points in parameter space, and this can be challenging in high numbers of dimensions. Here we present a Bayesian analysis, where we obtain the posterior for the marginal likelihood, using $k$th nearest-neighbour distances in parameter space, using the Mahalanobis distance metric, under the assumption that the points in the chain (thinned if required) are independent. We generalise the algorithm to apply to importance-sampled chains, where each point is assigned a weight. We illustrate this with an idealised posterior of known form with an analytic marginal likelihood, and show that for chains of length $\sim 10^5$ points, the technique is effective for parameter spaces with up to $\sim 20$ dimensions. We also argue that $k=1$ is the optimal choice, and discuss failure modes for the algorithm. In a companion paper (Heavens et al. 2017) we apply the technique to the main MCMC chains from the 2015 Planck analysis of cosmic background radiation data, to infer that quantitatively the simplest 6-parameter flat $Λ$CDM standard model of cosmology is preferred over all extensions considered.
