Table of Contents
Fetching ...

Stochastic Gradient Piecewise Deterministic Monte Carlo Samplers

Paul Fearnhead, Sebastiano Grazzi, Chris Nemeth, Gareth O. Roberts

TL;DR

This work proposes approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions using an Euler approximation to the true PDMP dynamics, and calls this class of algorithms stochastic-gradient PDMPs, which has similar efficiency to, but is more robust than, stochastic gradient Langevin dynamics.

Abstract

Recent work has suggested using Monte Carlo methods based on piecewise deterministic Markov processes (PDMPs) to sample from target distributions of interest. PDMPs are non-reversible continuous-time processes endowed with momentum, and hence can mix better than standard reversible MCMC samplers. Furthermore, they can incorporate exact sub-sampling schemes which only require access to a single (randomly selected) data point at each iteration, yet without introducing bias to the algorithm's stationary distribution. However, the range of models for which PDMPs can be used, particularly with sub-sampling, is limited. We propose approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions. The approximation takes the form of an Euler approximation to the true PDMP dynamics, and involves using an estimate of the gradient of the log-posterior based on a data sub-sample. We thus call this class of algorithms stochastic-gradient PDMPs. Importantly, the trajectories of stochastic-gradient PDMPs are continuous and can leverage recent ideas for sampling from measures with continuous and atomic components. We show these methods are easy to implement, present results on their approximation error and demonstrate numerically that this class of algorithms has similar efficiency to, but is more robust than, stochastic gradient Langevin dynamics.

Stochastic Gradient Piecewise Deterministic Monte Carlo Samplers

TL;DR

This work proposes approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions using an Euler approximation to the true PDMP dynamics, and calls this class of algorithms stochastic-gradient PDMPs, which has similar efficiency to, but is more robust than, stochastic gradient Langevin dynamics.

Abstract

Recent work has suggested using Monte Carlo methods based on piecewise deterministic Markov processes (PDMPs) to sample from target distributions of interest. PDMPs are non-reversible continuous-time processes endowed with momentum, and hence can mix better than standard reversible MCMC samplers. Furthermore, they can incorporate exact sub-sampling schemes which only require access to a single (randomly selected) data point at each iteration, yet without introducing bias to the algorithm's stationary distribution. However, the range of models for which PDMPs can be used, particularly with sub-sampling, is limited. We propose approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions. The approximation takes the form of an Euler approximation to the true PDMP dynamics, and involves using an estimate of the gradient of the log-posterior based on a data sub-sample. We thus call this class of algorithms stochastic-gradient PDMPs. Importantly, the trajectories of stochastic-gradient PDMPs are continuous and can leverage recent ideas for sampling from measures with continuous and atomic components. We show these methods are easy to implement, present results on their approximation error and demonstrate numerically that this class of algorithms has similar efficiency to, but is more robust than, stochastic gradient Langevin dynamics.

Paper Structure

This paper contains 22 sections, 1 theorem, 34 equations, 13 figures, 3 tables, 4 algorithms.

Key Result

Proposition 3.1

Let $\overline{\mathcal{P}}_t(z,\cdot)$ and ${\mathcal{P}}_t(z,\cdot)$ be the transition kernels for the stochastic gradient PDMP process of Algorithm alg:sg-pdmp and for the true underlying PDMP process, respectively. Assume the PDMP processes have bounded velocities, so $\|v\|<C_0$ for some $C_0$.

Figures (13)

  • Figure 1: Top panels: traces of SBPS, SBPS with pre-conditioning (PSBPS) (left) and our Euler approximation of the bouncy particle sampler, SG-BPS (right) with step-size $h = 10^{-4}$. Bottom panel: auto-correlation function of the first coordinate for the three algorithms. All algorithms were implemented with the same CPU cost, and output thinned to give 10,000 samples.
  • Figure 2: Left top panel: error in standard deviation estimation of each coordinate \ref{['eq: error on variances']}$\mathcal{E}^{(h,d)}$ as a function of the stepsize $h$ ($x$-axis in log-scale). Left bottom panel: $\mathcal{E}^{(h,d)}$ as a function of $d$. Right panels: trace plots of the first coordinate.
  • Figure 3: Error between standard deviation estimation of each coordinate and the one relative to the Laplace approximation as in \ref{['eq: error on variances']} for the logistic regression as a function of the stepsize $h$ ($x$-axis in log-scale). Dashed lines corresponds to SGLD algorithms, solid lines to SG-PDMPs.
  • Figure 4: Trace of the loss function for each sampler for different step-sizes $h$ relative to the first permutation of the training and test set of the dataset 'boston'.
  • Figure 5: From the top-left to the bottom-right. Traces and auto-correlation function of the first coordinate of SBPS, SG-BPS, SG-ZZ (with step-sizes $10^{-3}, 10^{-4}$) relative to the Bayesian logistic regression model. All algorithms where initialised near the mode of the posterior (second experiment).
  • ...and 8 more figures

Theorems & Definitions (1)

  • Proposition 3.1