Table of Contents
Fetching ...

Sacred and Profane: from the Involutive Theory of MCMC to Helpful Hamiltonian Hacks

Nathan E. Glatt-Holtz, Andrew J. Holbrook, Justin A. Krometis, Cecilia F. Mondaini, Ami Sheth

Abstract

In the first edition of this Handbook, two remarkable chapters consider seemingly distinct yet deeply connected subjects ...

Sacred and Profane: from the Involutive Theory of MCMC to Helpful Hamiltonian Hacks

Abstract

In the first edition of this Handbook, two remarkable chapters consider seemingly distinct yet deeply connected subjects ...

Paper Structure

This paper contains 12 sections, 1 theorem, 38 equations, 4 figures, 3 algorithms.

Key Result

Theorem 2.1

Fix $p \geq 1$ and take Suppose that Then, under P1 and P2, alg:main:master is unbiased with respect to $\mu$. More precisely, under these conditions, $P$ defined as eq:main:master:ker is reversible with respect to $\mu$ namely $P(\mathbf{q},d\tilde{\mathbf{q}}) \mu(d\mathbf{q}) = P(\tilde{\mathbf{q}},d\mathbf{q}) \mu(d\tilde{\mathbf{q

Figures (4)

  • Figure 1: Inferring 1,892 random effects of a 44-dimensional continuous-time Markov chain (CTMC) model \ref{['eq:glm']} within a larger phylogenetic CTMC model applied to the spread of SARS-CoV-2 between $D=44$ global regions. The surrogate-trajectory HMC algorithm uses a first-order approximation to the matrix exponential derivative \ref{['eq:grad:series']}. After searching over a field of leapfrog steps and target acceptance rates, we find that the relatively small number of 8 leapfrog steps and a target acceptance rate of 0.7 are optimal for this problem. Due to the enormous $\mathcal{O}(D^7)$ time complexity of the exact gradient in this context, it is difficult to obtain effective sample sizes for classical HMC.
  • Figure 2: Effective sample sizes per hour of distances between inferred latent locations of 1,370 observed H1N1 influenza viruses. While we perform inference over all 1,370 viral locations, we only show performance results for a representative subset of 4,950 distances. Full HMC requires the entire BMDS gradient (\ref{['eq:grad:HMC']}), but surrogate-trajectory HMC uses a subset of the gradient selected by 50 bands. The cost-savings of surrogate-trajectory HMC fail to make up for its decreased sampling performance.
  • Figure 3: Key performance results for the advection-diffusion example. All algorithms use tuning parameters that maximize mean squared jumping distance per second (MSJD/s). The left plot shows the absolute value of autocorrelation per second computed via the StatsBaseJulia package bezanson2017julia. The right plot shows mean squared jumping distance per second. The mpCN results are synthetic insofar as they assume efficient parallelizaiton across target evaluations at all 64 proposals.
  • Figure 4: Two-dimensional posterior density plots of the first four Fourier components of fluid flow $\mathbf{v}$ for the advection-diffusion example \ref{['eq:ad:eqn']}. Left: The "true" posterior from borggaard2020bayesian. Middle: HMC after 200,000 samples only visits a subset of the posterior modes. Right: NNgHMC with the small neural network visits all modes after the computational equivalent of 200,000 HMC samples.

Theorems & Definitions (1)

  • Theorem 2.1