Table of Contents
Fetching ...

Automatic Laplace Collapsed Sampling: Scalable Marginalisation of Latent Parameters via Automatic Differentiation

Toby Lovick, David Yallup, Will Handley

Abstract

We present Automatic Laplace Collapsed Sampling (ALCS), a general framework for marginalising latent parameters in Bayesian models using automatic differentiation, which we combine with nested sampling to explore the hyperparameter space in a robust and efficient manner. At each nested sampling likelihood evaluation, ALCS collapses the high-dimensional latent variables $z$ to a scalar contribution via maximum a posteriori (MAP) optimisation and a Laplace approximation, both computed using autodiff. This reduces the effective dimension from $d_θ+ d_z$ to just $d_θ$, making Bayesian evidence computation tractable for high-dimensional settings without hand-derived gradients or Hessians, and with minimal model-specific engineering. The MAP optimisation and Hessian evaluation are parallelised across live points on GPU-hardware, making the method practical at scale. We also show that automatic differentiation enables local approximations beyond Laplace to parametric families such as the Student-$t$, which improves evidence estimates for heavy-tailed latents. We validate ALCS on a suite of benchmarks spanning hierarchical, time-series, and discrete-likelihood models and establish where the Gaussian approximation holds. This enables a post-hoc ESS diagnostic that localises failures across hyperparameter space without expensive joint sampling.

Automatic Laplace Collapsed Sampling: Scalable Marginalisation of Latent Parameters via Automatic Differentiation

Abstract

We present Automatic Laplace Collapsed Sampling (ALCS), a general framework for marginalising latent parameters in Bayesian models using automatic differentiation, which we combine with nested sampling to explore the hyperparameter space in a robust and efficient manner. At each nested sampling likelihood evaluation, ALCS collapses the high-dimensional latent variables to a scalar contribution via maximum a posteriori (MAP) optimisation and a Laplace approximation, both computed using autodiff. This reduces the effective dimension from to just , making Bayesian evidence computation tractable for high-dimensional settings without hand-derived gradients or Hessians, and with minimal model-specific engineering. The MAP optimisation and Hessian evaluation are parallelised across live points on GPU-hardware, making the method practical at scale. We also show that automatic differentiation enables local approximations beyond Laplace to parametric families such as the Student-, which improves evidence estimates for heavy-tailed latents. We validate ALCS on a suite of benchmarks spanning hierarchical, time-series, and discrete-likelihood models and establish where the Gaussian approximation holds. This enables a post-hoc ESS diagnostic that localises failures across hyperparameter space without expensive joint sampling.

Paper Structure

This paper contains 92 sections, 39 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Supernova cosmology scaling for $\Lambda$CDM and $w$CDM. (a--b) Test 1: wall time and evidence error $\delta\log\mathcal{Z}$ vs $N$ at fixed $d_{z,\text{block}}=2$. (c--d) Test 2: wall time (log--log) and $\delta\log\mathcal{Z}$ vs total latent dimension $D$ at fixed $N=100$; the $D^3$ full-NS reference is projected. Top axis in (c) shows latents per object.
  • Figure 2: Student-$t$ extension ($\nu_\text{true}=5$, $N_\text{obj}\in\{10,20,50,100,150\}$). (a) Evidence error $\delta\log\mathcal{Z}$ for Gaussian (grey) and Student-$t$ (blue) ALCS. (b) IS ESS$/K$ (median over $M{=}200$ posterior samples). See \ref{['tab:student']} for numerical values.
  • Figure 3: Latent conditional $p(z_j\mid\theta,x_j)$ for $\theta\in\{-1,0,+1,+2\}$. At $\theta<0$ the posterior is near-Gaussian; as $\theta$ increases $\tanh(z_j)$ saturates and flat shoulders develop.
  • Figure 4: Tanh funnel ($J=10$). (a) Pointwise evidence error vs $\theta$: near zero for $\theta<0$, growing to ${\sim}20$ nats by $\theta=3$. (b) IS ESS$/K$ at each $\theta$ ($K{=}5000$): ${\approx}1$ for $\theta<0$, dropping to ${\ll}0.01$ for $\theta>0$, localising the approximation failure.
  • Figure 5: IRT $\mu_\text{ability}$ posterior: ALCS (blue) vs full joint NUTS (orange, 501D, 4 chains). The marginalised posterior is unbiased despite ESS$/K=0.10$.
  • ...and 2 more figures