Table of Contents
Fetching ...

Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process

Akihiko Nishimura, Zhenyu Zhang, Marc A. Suchard

Abstract

Zigzag and other piecewise deterministic Markov process samplers have attracted significant interest for their non-reversibility and other appealing properties for Bayesian posterior computation. Hamiltonian Monte Carlo is another state-of-the-art sampler, exploiting fictitious momentum to guide Markov chains through complex target distributions. We establish an important connection between the zigzag sampler and a variant of Hamiltonian Monte Carlo based on Laplace-distributed momentum. The position and velocity component of the corresponding Hamiltonian dynamics travels along a zigzag path paralleling the Markovian zigzag process; however, the dynamics is non-Markovian in this position-velocity space as the momentum component encodes non-immediate pasts. This information is partially lost during a momentum refreshment step, in which we preserve its direction but re-sample magnitude. In the limit of increasingly frequent momentum refreshments, we prove that Hamiltonian zigzag converges strongly to its Markovian counterpart. This theoretical insight suggests that, when retaining full momentum information, Hamiltonian zigzag can better explore target distributions with highly correlated parameters by suppressing the diffusive behavior of Markovian zigzag. We corroborate this intuition by comparing performance of the two zigzag cousins on high-dimensional truncated multivariate Gaussians, including a 11,235-dimensional target arising from a Bayesian phylogenetic multivariate probit modeling of HIV virus data.

Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process

Abstract

Zigzag and other piecewise deterministic Markov process samplers have attracted significant interest for their non-reversibility and other appealing properties for Bayesian posterior computation. Hamiltonian Monte Carlo is another state-of-the-art sampler, exploiting fictitious momentum to guide Markov chains through complex target distributions. We establish an important connection between the zigzag sampler and a variant of Hamiltonian Monte Carlo based on Laplace-distributed momentum. The position and velocity component of the corresponding Hamiltonian dynamics travels along a zigzag path paralleling the Markovian zigzag process; however, the dynamics is non-Markovian in this position-velocity space as the momentum component encodes non-immediate pasts. This information is partially lost during a momentum refreshment step, in which we preserve its direction but re-sample magnitude. In the limit of increasingly frequent momentum refreshments, we prove that Hamiltonian zigzag converges strongly to its Markovian counterpart. This theoretical insight suggests that, when retaining full momentum information, Hamiltonian zigzag can better explore target distributions with highly correlated parameters by suppressing the diffusive behavior of Markovian zigzag. We corroborate this intuition by comparing performance of the two zigzag cousins on high-dimensional truncated multivariate Gaussians, including a 11,235-dimensional target arising from a Bayesian phylogenetic multivariate probit modeling of HIV virus data.

Paper Structure

This paper contains 27 sections, 7 theorems, 106 equations, 11 figures, 9 tables, 7 algorithms.

Key Result

Theorem 2.1

Suppose that $U(\bm{x})$ is twice continuously differentiable and that the sets $\{ \bm{x} : \partial_i U(\bm{x}) = 0 \}$ comprise differentiable manifolds of dimension at most $d - 1$. Then Eq eq:hamilton_under_laplace_momentum defines a unique dynamics on $\mathbb{R}^{2d}$ away from a set of Lebes

Figures (11)

  • Figure 1: Trajectories of the first two position coordinates of Hamiltonian zigzag without momentum refreshment (left) and Markovian zigzag (right). The target is a $2^{10} = 1{,}024$-dimensional Gaussian, corresponding to a stationary lag-one auto-regressive process with auto-correlation $0.99$ and unit marginal variances. Both dynamics are simulated for $10^5$ linear segments, starting from the same position $x_i = -1$ for all $i$ and same random velocity. The line segment colors change from darkest to lightest as the dynamics evolve.
  • Figure 2: Squared distance $\| \bm{x}(t) - \bm{x}_0 \|^2$ of the two zigzag dynamics from the initial position, plotted as function of the number of velocity change events. The experimental setup is identical to that of Figure \ref{['fig:zigzag_trajectory_on_ar_target']}. The dashed line indicates the expected squared distance between the initial position and an independent sample from the target, as a benchmark of the distance traveled by an efficient transition kernel.
  • Figure 3: Comparison of Markovian (left) and Hamiltonian (right) zigzag trajectories under the one-dimensional potential $U(x)$. Neither zigzag is affected by velocity switch events while going down the potential energy hill, during which the velocity and gradient point in the opposite directions and the relation $v(t) \partial U(x(t)) < 0$ holds. During this time, Hamiltonian zigzag stores up kinetic energy converted from potential energy, while Markovian zigzag remains memory-less. Once the trajectories reaches the potential energy minimum $U_{\min}$ at time $t_{\min} := \mathop{\mathrm{argmin}}\limits_{t \, \geq \, 0} U(x_0 + t v_0)$ and start going "uphill," the accumulated momentum $| p(t_{\min}) | = | p_0 | + U(x_0) - U_{\min}$ keeps Hamiltonian zigzag traveling in the same direction longer than Markovian zigzag. The last statement technically holds only "on average" due to randomness in the realized values of $| p_0 | \mathrel{\raisebox{-.2ex}{$\overset{\hbox{$\, d$}}{=}$}} - \log u$.
  • Figure 4: Traceplot of the zigzag samples from the $1{,}024$-dimensional compound symmetric posterior \ref{['eq:compound_symmetry_covariance']} with $\rho = 0.99$, projected onto the principal component $\bm{u} = (1, \ldots, 1) / \sqrt{d}$ via a map $\bm{x} \to \langle \bm{x}, \bm{u} \rangle$. The horizontal axis is scaled to represent the number of velocity switch events.
  • Figure 5: Example paths of latent biological traits following the phylogenetic diffusion. The traits of two distinct organisms evolve together until a branching event. Beyond that point, the traits evolve independently but with the same diffusion covariance induced by a shared bio-molecular mechanism.
  • ...and 6 more figures

Theorems & Definitions (13)

  • Theorem 2.1
  • Corollary 2.2
  • Theorem 3.1: Weak convergence
  • Theorem 3.2: Strong convergence
  • proof : Proof of existence and uniqueness
  • proof : Proof of time-reversibility and symplecticity
  • Lemma S1.1
  • proof
  • Theorem S2.1
  • proof
  • ...and 3 more