Metropolis--Hastings with Scalable Subsampling

Estevão Prado; Christopher Nemeth; Chris Sherlock

Metropolis--Hastings with Scalable Subsampling

Estevão Prado, Christopher Nemeth, Chris Sherlock

TL;DR

This paper introduces a new subsample MH algorithm that satisfies detailed balance with respect to the target posterior and utilises control variates to enable exact, efficient Bayesian inference on datasets with large numbers of observations.

Abstract

The Metropolis-Hastings (MH) algorithm is one of the most widely used Markov Chain Monte Carlo schemes for generating samples from Bayesian posterior distributions. The algorithm is asymptotically exact, flexible and easy to implement. However, in the context of Bayesian inference for large datasets, evaluating the likelihood on the full data for thousands of iterations until convergence can be prohibitively expensive. This paper introduces a new subsample MH algorithm that satisfies detailed balance with respect to the target posterior and utilises control variates to enable exact, efficient Bayesian inference on datasets with large numbers of observations. Through theoretical results, simulation experiments and real-world applications on certain generalised linear models, we demonstrate that our method requires substantially smaller subsamples and is computationally more efficient than the standard MH algorithm and other exact subsample MH algorithms.

Metropolis--Hastings with Scalable Subsampling

TL;DR

Abstract

Paper Structure (51 sections, 13 theorems, 115 equations, 14 figures, 9 tables, 2 algorithms)

This paper contains 51 sections, 13 theorems, 115 equations, 14 figures, 9 tables, 2 algorithms.

Introduction
Metropolis--Hastings with Scalable Subsampling
The MHSS algorithm
Existing exact subsampling MH algorithms
Scalable Metropolis--Hastings
TunaMH
Numerical comparison
Theoretical results
Optimality with respect to phi
Bounds on the remainder terms
General bounds on the remainders
Regression models: Further improvement on bounds
Computational cost and optimal tuning
Simulation experiments
Logistic regression
...and 36 more sections

Key Result

Proposition 1

A MH-based Markov chain that proposes via $q(\theta'|\theta)$ and sample_s_i and accepts $\theta'$ with a probability of tuna_ratioNODA satisfies detailed balance with respect to $\pi(\theta|y)\propto p(\theta) \exp[\sum_{i=1}^n \ell_i(\theta)]$.

Figures (14)

Figure 1: Acceptance rates and ESS per second for SMH-1, Tuna and RWM. The results are based on synthetic datasets generated from a logistic regression model with $n = 31,622$ observations. The y-axis of panels (a) and (b) are presented in the logarithm base 10.
Figure 2: Acceptance rates of MH-SS with first-order control variates.
Figure 3: Average batch size for MH-SS, SMH and RWM for the logistic regression model. For RWM, the average batch size is $n$. Both axes are presented in the logarithm base 10.
Figure 4: ESS per second of MH-SS, SMH and RWM for the logistic regression model. Both axes are presented in the logarithm base 10. Some ESSs are omitted because $\mathbb{E}(B) \ge n$, which implies the use and efficiency of the RWM algorithm.
Figure 5: Average batch size and ESS per second for MH-SS, vanilla SMH and SMH with new bounds for the logistic regression model ($d = 30$).
...and 9 more figures

Theorems & Definitions (19)

Proposition 1
proof
Theorem 1
Corollary 1
Theorem 2
Remark 1
Remark 2
Lemma 1
Remark 3
Corollary 2
...and 9 more

Metropolis--Hastings with Scalable Subsampling

TL;DR

Abstract

Metropolis--Hastings with Scalable Subsampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (19)