Table of Contents
Fetching ...

Functional Estimation of the Marginal Likelihood

Omiros Papaspiliopoulos, Timothée Stumpf-Fétizon, Jonathan Weare

TL;DR

This paper develops a functional estimator for the marginal likelihood in models with high-dimensional latent parameters and low-dimensional hyperparameters, built on the EMUS umbrella-sampling framework with $p(y|\lambda)=\int p(y|\theta,\lambda)p(\theta|\lambda)\,d\theta$. A key idea is to extend grid-based EMUS to the full hyperparameter domain via a kernel-like function $f(\lambda_i,\lambda)$ so that $u(\lambda)=\sum_\ell u_\ell f(\lambda_\ell,\lambda)$, and to estimate $\hat{u}(\lambda)=\sum_\ell \hat{u}_\ell \hat{f}(\lambda_\ell,\lambda)$ with $\hat{u}(\lambda_\ell)=\hat{u}_\ell$, enabling cheap evaluation on a fine grid and gradient-based optimization. The authors prove two consistency results (fixed-grid and dense-grid), leveraging uniform laws of large numbers and, in the dense-grid case, additional smoothness and irreducibility conditions; they also relate EMUS to Gibbs sampling, Vardi, bridge sampling, and SMC, highlighting robustness to spectral gaps. Through numerical experiments on Gaussian process regression, Gaussian process classification, and crossed random effect models, the paper demonstrates accurate functional marginal-likelihood estimates and provides practical guidance on grid design, sampling allocation, and optimal design strategies.

Abstract

We propose a framework for computing, optimizing and integrating with respect to a smooth marginal likelihood in statistical models that involve high-dimensional parameters/latent variables and continuous low-dimensional hyperparameters. The method requires samples from the posterior distribution of the parameters for different values of the hyperparameters on a simulation grid and returns inference on the marginal likelihood defined everywhere on its domain, and on its functionals. We show how the method relates to many of the methods that have been used in this context, including sequential Monte Carlo, Gibbs sampling, Monte Carlo maximum likelihood, and umbrella sampling. We establish the consistency of the proposed estimators as the sampling effort increases, both when the simulation grid is kept fixed and when it becomes dense in the domain. We showcase the approach on Gaussian process regression and classification and crossed effect models.

Functional Estimation of the Marginal Likelihood

TL;DR

This paper develops a functional estimator for the marginal likelihood in models with high-dimensional latent parameters and low-dimensional hyperparameters, built on the EMUS umbrella-sampling framework with . A key idea is to extend grid-based EMUS to the full hyperparameter domain via a kernel-like function so that , and to estimate with , enabling cheap evaluation on a fine grid and gradient-based optimization. The authors prove two consistency results (fixed-grid and dense-grid), leveraging uniform laws of large numbers and, in the dense-grid case, additional smoothness and irreducibility conditions; they also relate EMUS to Gibbs sampling, Vardi, bridge sampling, and SMC, highlighting robustness to spectral gaps. Through numerical experiments on Gaussian process regression, Gaussian process classification, and crossed random effect models, the paper demonstrates accurate functional marginal-likelihood estimates and provides practical guidance on grid design, sampling allocation, and optimal design strategies.

Abstract

We propose a framework for computing, optimizing and integrating with respect to a smooth marginal likelihood in statistical models that involve high-dimensional parameters/latent variables and continuous low-dimensional hyperparameters. The method requires samples from the posterior distribution of the parameters for different values of the hyperparameters on a simulation grid and returns inference on the marginal likelihood defined everywhere on its domain, and on its functionals. We show how the method relates to many of the methods that have been used in this context, including sequential Monte Carlo, Gibbs sampling, Monte Carlo maximum likelihood, and umbrella sampling. We establish the consistency of the proposed estimators as the sampling effort increases, both when the simulation grid is kept fixed and when it becomes dense in the domain. We showcase the approach on Gaussian process regression and classification and crossed effect models.
Paper Structure (32 sections, 31 theorems, 209 equations, 9 figures)

This paper contains 32 sections, 31 theorems, 209 equations, 9 figures.

Key Result

Proposition 1

Let $\boldsymbol{z}$ be the vector of normalizing constants, and $\boldsymbol{F}$ the stochastic matrix with elements Then, $\boldsymbol{z}$ is in detailed balance with $\boldsymbol{F}$: and therefore it solves the eigenvector problem

Figures (9)

  • Figure 1: The data with the generating regression function $\mathop{\mathrm{\mathbb{E}}}\nolimits[y|x]$.
  • Figure 2: Typical estimates of $u(\lambda)$ according to functional EMUS (left) and extrapolated griddy Gibbs (right, see text for how this is obtained), both on a $17 \times 17$ simulation grid and a $33 \times 33$ evaluation grid.
  • Figure 3: True profiles for $\tau_{1},\tau_{2}$ (solid black line), and 75%-intervals of functional EMUS estimator (blue) and griddy Gibbs (orange) over 128 runs of each method.
  • Figure 4: Total number of samples $N$ vs. error, defined as $\mathop{\mathrm{\mathbb{E}}}\nolimits |\widehat{\boldsymbol{u}}^{e} / |\widehat{\boldsymbol{u}}^{e}|_{1} - \boldsymbol{u}^{e} / |\boldsymbol{u}^{e}|_{1}|_{2}$, where $\widehat{\boldsymbol{u}}^{e}$ and $\boldsymbol{u}^{e}$ are the evaluations of $\widehat{u}$ and $u$ on $\Lambda^e$, and $\Lambda^e$ is a $33 \times 33$ grid. The error is approximated by averaging over 128 estimates. The blue line corresponds to the regime with a fixed $17 \times 17$ grid and increasing $N$, the green line to the regime with a fixed $5 \times 5$ grid and increasing $N$, and the orange line to increasingly fine grids with $16$ samples per grid point. The grids of the orange line span from $5 \times 5$ on the left to $33 \times 33$ (same as $\Lambda^e$) on the right. The slope of the dashed line corresponds to the asymptotic Monte Carlo rate of $1/\sqrt{N}$.
  • Figure 5: An expensive high-precision estimate of $u(\lambda)$ by functional EMUS (left, 8192 MCMC samples per grid point), and a typical estimate (right, 256 samples per grid point), both on a $17 \times 17$ simulation grid and a $33 \times 33$ evaluation grid.
  • ...and 4 more figures

Theorems & Definitions (63)

  • Proposition 1
  • Remark 1
  • Remark 2
  • Theorem 1
  • Proposition 2
  • proof
  • Proposition 3
  • Proposition 4
  • proof
  • Proposition 5
  • ...and 53 more