Table of Contents
Fetching ...

Smoothing Out Sticking Points: Sampling from Discrete-Continuous Mixtures with Dynamical Monte Carlo by Mapping Discrete Mass into a Latent Universe

Andrew Chin, Akihiko Nishimura

TL;DR

Theoretical and empirical findings suggest alternatives to the sticky approach to be at least as efficient as the original sticky approach, including Hamiltonian Monte Carlo and Bayesian models.

Abstract

Combining a continuous "slab" density with discrete "spike" mass at zero, spike-and-slab priors provide important tools for inducing sparsity and carrying out variable selection in Bayesian models. However, the presence of discrete mass makes posterior inference challenging. "Sticky" extensions to piecewise-deterministic Markov process samplers have shown promising performance, where sampling from the spike is achieved by the process sticking there for an exponentially distributed duration. As it turns out, the sampler remains valid when the exponential sticking time is replaced with its expectation. We justify this by mapping the spike to a continuous density over a latent universe, allowing the sampler to be reinterpreted as traversing this universe while being stuck in the original space. This perspective opens up an array of possibilities to carry out posterior computation under spike-and-slab type priors. Notably, it enables us to construct sticky samplers using other dynamics-based paradigms such as Hamiltonian Monte Carlo; in fact, original sticky process can be established as a partial position-momentum refreshment limit of our Hamiltonian sticky sampler. Our theoretical and empirical findings suggest these alternatives to be at least as efficient as the original sticky approach.

Smoothing Out Sticking Points: Sampling from Discrete-Continuous Mixtures with Dynamical Monte Carlo by Mapping Discrete Mass into a Latent Universe

TL;DR

Theoretical and empirical findings suggest alternatives to the sticky approach to be at least as efficient as the original sticky approach, including Hamiltonian Monte Carlo and Bayesian models.

Abstract

Combining a continuous "slab" density with discrete "spike" mass at zero, spike-and-slab priors provide important tools for inducing sparsity and carrying out variable selection in Bayesian models. However, the presence of discrete mass makes posterior inference challenging. "Sticky" extensions to piecewise-deterministic Markov process samplers have shown promising performance, where sampling from the spike is achieved by the process sticking there for an exponentially distributed duration. As it turns out, the sampler remains valid when the exponential sticking time is replaced with its expectation. We justify this by mapping the spike to a continuous density over a latent universe, allowing the sampler to be reinterpreted as traversing this universe while being stuck in the original space. This perspective opens up an array of possibilities to carry out posterior computation under spike-and-slab type priors. Notably, it enables us to construct sticky samplers using other dynamics-based paradigms such as Hamiltonian Monte Carlo; in fact, original sticky process can be established as a partial position-momentum refreshment limit of our Hamiltonian sticky sampler. Our theoretical and empirical findings suggest these alternatives to be at least as efficient as the original sticky approach.

Paper Structure

This paper contains 16 sections, 3 theorems, 31 equations, 7 figures, 2 tables.

Key Result

Theorem 2.1

For a spike-and-slab posterior with twice continuously differentiable density part, the latent sticky sampler with the partial position refreshment converges strongly to the original sticky sampler; i.e., from the same initial state, we can create a sequence of the position-velocities trajectories $

Figures (7)

  • Figure 1: Constructing a continuous density representation of the spike-and-slab prior by spreading the spike mass over a latent universe and inserting it in the middle of the slab density. In other words, we introduce a latent parameter $\tilde{x}_i$ with the continuous density which, when collapsing the latent universe to 0, recovers the spike-and-slab prior on the original parameter $x_i$. A corresponding posterior on $\tilde{x}_i$ also has a continuous density as long as the likelihood is a continuous function of $x_i$. Sampling from the discrete-mixture posterior thus reduces to first sampling from the latent continuous density and then mapping the samples back to the original space.
  • Figure 2: Left: Bivariate product of spike-and-slab priors with standard normal slab, with the spike masses shown in blue. Middle: Latent continuous density representation of the prior. Right: Posterior density in the latent parameter space.
  • Figure 3: Example process starting unstuck with $x < 0$ and $v=1$, getting stuck at the spike, unsticking from there, bouncing once against the gradient, sticking again, and unsticking one last time. In this example, the refreshed process unsticks before the original zig-zag for both sticks. The effect of $\kappa'_r$ applied to $(x^r, v^r)$ is shown in orange, where $x^r(T) \neq x^r \circ \kappa'_r(T)$ since $\kappa'_r(T) \neq T$. The blue and green shaded regions represent time spent stuck for the original and refreshed processes, respectively; see Figure \ref{['fig:kappa']} for a visual illustration of how $\kappa'_r$ aligns the stuck times of the two samplers.
  • Figure 4: Time dilation functions $\kappa'_r$ and $\kappa_r$ for an interval $[0,T]$. The dilation to align sticking times occurs in the regions where the blue and green bands overlap. Outside of these regions there is no dilation until the final segment between $S_2+\varsigma_2$ and $T$, at which point $\kappa_r$ "catches up" so as to satisfy $\kappa_r(T) = T$ as required of a dilation function by the Skorokhod metric.
  • Figure 5: The process from Figure \ref{['fig:vx_kappatilde']} with $\kappa_r$ applied to $(x^r, v^r)$. The velocities are still identical, but the positions are only identical up until the last unsticking event, at which point they diverge up to a maximum discrepancy of $\sum_n\delta_n$.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 2.1
  • Theorem 2.2
  • proof : (One-dimensional case)
  • Lemma A.1
  • proof
  • proof : (Multidimensional case)
  • proof