Variance Reduction for the Independent Metropolis Sampler

Siran Liu; Petros Dellaportas; Michalis K. Titsias

Variance Reduction for the Independent Metropolis Sampler

Siran Liu, Petros Dellaportas, Michalis K. Titsias

Abstract

Assume that we would like to estimate the expected value of a function $F$ with respect to an intractable density $π$, which is specified up to some unknown normalising constant. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than i.i.d.\ sampling from $π$. The control variates construction requires no extra computational effort but assumes that the expected value of $F$ under $q$ is analytically available. We illustrate this result by calculating the marginal likelihood in a linear regression model with prior-likelihood conflict and a non-conjugate prior. Furthermore, we propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced. We demonstrate its applicability in a Bayesian logistic and Gaussian process regression problems and we rigorously justify our asymptotic arguments under easily verifiable and essentially minimal conditions.

Variance Reduction for the Independent Metropolis Sampler

Abstract

Assume that we would like to estimate the expected value of a function

with respect to an intractable density

, which is specified up to some unknown normalising constant. We prove that if

is close enough under KL divergence to another density

, an independent Metropolis sampler estimator that obtains samples from

with proposal density

, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than i.i.d.\ sampling from

. The control variates construction requires no extra computational effort but assumes that the expected value of

under

is analytically available. We illustrate this result by calculating the marginal likelihood in a linear regression model with prior-likelihood conflict and a non-conjugate prior. Furthermore, we propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced. We demonstrate its applicability in a Bayesian logistic and Gaussian process regression problems and we rigorously justify our asymptotic arguments under easily verifiable and essentially minimal conditions.

Paper Structure (38 sections, 10 theorems, 99 equations, 2 figures, 7 tables, 4 algorithms)

This paper contains 38 sections, 10 theorems, 99 equations, 2 figures, 7 tables, 4 algorithms.

Introduction
The general problem
Outline of the method
Related work
Outline of the paper
Control variates for Independent Metropolis algorithms
A new estimator based on control variates
Connection with previous estimators
Connection to Rao–Blackwellisation by integrating out decision step
Connection with a coupling estimator
Connection with an importance sampling estimator
Theoretical justifications
Numerical illustrations
Synthetic data examples
Bayesian model selection in non-conjugate linear regression
...and 23 more sections

Key Result

Theorem 1

(Proof in Section thmp1 of the supplementary material) Assume that for a target density $\pi(x)$ there exists a sequence of proposal distributions $\left\{q_i(x)\right\}_{i=1}^\infty$ such that $\lim_{i \rightarrow \infty} q_i(x) \rightarrow \pi(x)$ and for each proposal distribution $q_i$ the corre for some constant $c$.

Figures (2)

Figure 1: Comparison of $\mu_{n,IMCV}$ and $\mu_{n,MC^*}$ estimators. Top row: $\mathcal{N}(x|0,1)$ target and $\mathcal{N}(x|0,\sigma^2)$ proposal. Bottom row: $\mathcal{N}(x|0,1)$ target and $t_{\nu}(y)$ proposal. (a)-(c): Boxplots of $\mu_{n,IMCV}$ based on 20 repetitions for different values of $\sigma^2$ and $\nu$. (b)-(d): The logarithm of VRFs and corresponding theoretical bounds for different values of $\sigma^2$ and $\nu$.
Figure 2: Boxplot of VRFs for the coordinate estimates of different dimensional Gaussian target and Gaussian proposal

Theorems & Definitions (21)

Theorem 1
Proposition 1
Theorem 2
Corollary 1
Theorem 3
proof
proof
Lemma 1
Lemma 2
proof
...and 11 more

Variance Reduction for the Independent Metropolis Sampler

Abstract

Variance Reduction for the Independent Metropolis Sampler

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (21)