Table of Contents
Fetching ...

Parallel computations for Metropolis Markov chains with Picard maps

Sebastiano Grazzi, Giacomo Zanella

TL;DR

The proposed parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution using only point-wise evaluations of $\log\pi$ and parallel computing.

Abstract

We develop parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map. For Random Walk Metropolis Markov chains targeting log-concave distributions $π$ on $\mathbb{R}^d$, our algorithm generates samples close to $π$ in $\mathcal{O}(\sqrt{d})$ parallel iterations with $\mathcal{O}(\sqrt{d})$ processors, therefore speeding up the convergence of the corresponding sequential implementation by a factor $\sqrt{d}$. Furthermore, a modification of our algorithm generates samples from an approximate measure $ π_r$ in $\mathcal{O}(1)$ parallel iterations and $\mathcal{O}(d)$ processors. We empirically assess the performance of the proposed algorithms in high-dimensional regression problems, an epidemic model where the gradient is unavailable and a real-word application in precision medicine. Our algorithms are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution $π$ using only point-wise evaluations of $\logπ$ and parallel computing.

Parallel computations for Metropolis Markov chains with Picard maps

TL;DR

The proposed parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution using only point-wise evaluations of and parallel computing.

Abstract

We develop parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map. For Random Walk Metropolis Markov chains targeting log-concave distributions on , our algorithm generates samples close to in parallel iterations with processors, therefore speeding up the convergence of the corresponding sequential implementation by a factor . Furthermore, a modification of our algorithm generates samples from an approximate measure in parallel iterations and processors. We empirically assess the performance of the proposed algorithms in high-dimensional regression problems, an epidemic model where the gradient is unavailable and a real-word application in precision medicine. Our algorithms are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution using only point-wise evaluations of and parallel computing.

Paper Structure

This paper contains 45 sections, 27 theorems, 106 equations, 9 figures, 1 table, 5 algorithms.

Key Result

theorem 1

Under Assumptions ass: 1-ass: 1.2, for all $x_0 \in \mathcal{X}$, $w_0 \in \mathcal{W}$, we have for all $0 \le i \le \min(d,K)$ and $j\in\{0,1,\dots\}$, with $c_0 = 15 h^4 (\sqrt{\frac{2}{\pi}} + \frac{h \gamma}{2})^2$, $\delta(d) = \frac{5}{3}\exp(-3d/2)$. The probability in eq: bounding probability of incorrect guess is with respect to the randomness of $(W_1,W_2,\dots)$.

Figures (9)

  • Figure 1: Traces on the $(x_1, x_2)$-plane of the Picard recursion $X^{(j)}$ for $K = 1000$ and $j = 1,2,10,13$ (top left - bottom right). The time-index of each sequence is shown with a yellow-red gradient color. The underlying Markov chain is a $d = 100$ dimensional Random Walk Metropolis with stepsize $\xi = 2/\sqrt{d}$ targeting a standard Gaussian distribution. The gray line is the fixed point (i.e. the output of the sequential algorithm). The dashed line corresponds to the part of the trajectory that has converged to its fixed point.
  • Figure 2: Illustration of the classical Picard algorithm (Algorithm \ref{['alg: PA']}) applied sequentially to every block of length $K$ (left grid) vs. the Online Picard algorithm (Algorithm \ref{['alg: OPA']}, right grid). The color of the $(j,i)$ entry of each grid represents the state of the $i$th step of the Markov chain at the $j$th Picard recursion: red for $f(X^{(j)}_i, W_i) = f(X^{(j-1)}_i, W_i)$ (correct guess), blue for $f(X^{(j)}_i, W_i) \ne f(X^{(j-1)}_i, W_i)$. Black for the increments for which no processor has been allocated for computing the function $f$. Here, $K = 4$, $N = 3K$. Yellow boundary line in correspondence of $L^{(j)} = \sup\{i \le U^{(j)}\colon X_\ell^{(j)} = X_\ell^{(j-1)}; \ell \le i\}$.
  • Figure 3: Performance of Online Picard algorithm ($\bar{X}$) and its approximate versions ($\bar{X}_r, \, r =5\%,\dots,20\%$) applied to RWM, with target being the linear regression model E1. Left panel: average speedup $\hat{G}$ ($y$-axis, on a log-scale) with $K \in \{\sqrt{d}, d\}$, $N = 10^4$ and $d = 10^2,\dots,10^3$ ($x$-axis, on a log-scale). Dashed lines $d \mapsto \sqrt{d}$ (blue) and $d \mapsto 3d/20$ (red) for reference. Right panel: average speedup $\hat{G}$ ($y$-axis, on a log-scale) for $d = 500$, $N = 10^4$ and $K = 2,\dots,1500$ ($x$-axis, on a log scale). Vertical dashed lines for $K = \sqrt{d}$ and $K = d$ for reference.
  • Figure 4: Same as Figure \ref{['fig: gauss']} for the logistic regression model E2 (top panels) and the Poisson regression model E3 (bottom panels), with $N = 10^4, K \in \{\sqrt{d}, d\}$ (left panels). $N = 10^4, d = 200$ (right panels).
  • Figure 5: $\mathcal{M}$ and $\mathcal{E}$ as in \ref{['eq: error in estimation']} for the Approximate Picard algorithms with tolerance ($x$-axis) ranging from $0\%$ (exact Picard algorithm) to $20\%$ for $d = 10^2,\, K = d, \, N = 10^5$. $N = 4 \times 10^5$ (right panels) for the linear regression E1 (left panel), logistic regression E2 (middle panel), Poisson regression E3 (right panel). Dashed lines for MwG.
  • ...and 4 more figures

Theorems & Definitions (52)

  • theorem 1: Probability of an incorrect guess
  • remark 1
  • remark 2
  • corollary 1
  • theorem 2
  • corollary 2: Parallel round complexity of Online Picard algorithm
  • proposition 1
  • theorem 3
  • proposition 2: MwG with isotropic Gaussian targets
  • proposition 3
  • ...and 42 more