Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Filippo Ascolani; Gareth O. Roberts; Giacomo Zanella

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Filippo Ascolani, Gareth O. Roberts, Giacomo Zanella

TL;DR

This work develops a general theory connecting coordinate-wise MCMC convergence to the Gibbs sampler via conditional conductance, enabling dimension-free mixing insights for high-dimensional Bayesian models. By bounding the approximate conductance of MwG updates in terms of the exact Gibbs conductance and analyzing auxiliary perturbation results, the authors establish dimension-free mixing times for MwG in two-level hierarchical structures and related applications. They provide concrete results for MwG with independent MH updates, conditionally log-concave targets, discrete data, binary regression with unknown prior variance, and diffusion data augmentation, showing that fast convergence can be achieved with manageable computational costs. The findings offer practical guidance on when MwG approaches rival gradient-based methods, and they extend the theoretical toolbox for combining MCMC convergence with Bayesian asymptotics in complex, large-scale models.

Abstract

We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

TL;DR

Abstract

Paper Structure (49 sections, 28 theorems, 201 equations, 5 figures)

This paper contains 49 sections, 28 theorems, 201 equations, 5 figures.

Introduction
Motivating example: non-conjugate hierarchical models
Objective and structure of the paper
Related literature
Mixing times and conductance
Coordinate-wise MCMC
Applications to Metropolis-within-Gibbs schemes
Conditional updates with independent Metropolis-Hastings
Conditionally log-concave distributions
Auxiliary results for statistical applications
Approximate conductance and perturbations of Markov operators
Conductance and independent products
High-dimensional hierarchical models
Gibbs and MwG kernels
Statistical assumptions
...and 34 more sections

Key Result

Lemma 1

Let $P$ be a $\pi$-reversible and positive semi-definite Markov transition kernel. For every $s \in [0, 1/2)$, $t \geq 0$, $M \geq 1$ and $\mu \in \mathcal{N}\left(\pi, M \right)$, it holds In particular, if $s = \frac{\epsilon}{2M}$ we have

Figures (5)

Figure 1: Median of the integrated autocorrelation times multiplied by the average number of likelihood evaluations per iterations (on log-scale) for four MCMC schemes targeting the posterior distribution of model \ref{['eq:one_level_nested_intro']}, as a function of $J$ (number of groups). The median refers to repetitions over datasets randomly generated according to the model with true parameters $\mu^* = \tau^* = 1$. See Section \ref{['sec:discrete_data']} for more details.
Figure 2: Median IATs (on the log scale) of four MCMC schemes targeting the posterior distribution of model \ref{['logistic_model_with_covariates']} with $\ell = 5$ and $m=30$, as a function of the number of groups. The median refers to repetitions over datasets randomly generated according to the model with $(\tau^*_1,\dots,\tau^*_5)=(2,1,1,3,2)$ and $\mu_k^*\stackrel{iid}\sim \hbox{Unif}([-1,1])$ for every $k = 1,\dots, 5$.
Figure 3: Median IATs of five MCMC schemes targeting the posterior distribution of model \ref{['eq:one_level_nested_intro']}, as a function of the number of covariates (intercept included). The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model_with_covariates']} with $\tau_k^* = 0.5$ and $\mu_k^*\stackrel{iid}\sim \hbox{Unif}([-1,1])$ for every $k$. Some points for IMH and RWM are omitted due to the difficulty of appropriately estimating high values of IATs.
Figure 4: Median IATs (on the log scale) of three MCMC schemes targeting the posterior distribution of model \ref{['logistic_model']} with $d = 5$, as a function of the number of observations $n$. The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model']} with $\alpha^* = 1$ and $\Sigma = \frac{1}{5}\mathbb{I}$.
Figure 5: Median IATs of four MCMC schemes targeting the posterior distribution of model \ref{['logistic_model']}, as a function of the number of covariates $d$ and $n = d/2$. The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model']} with $\alpha^* = 1$ and $\Sigma = \frac{1}{d}\mathbb{I}$. Full lines: centered parametrization. Dotted lines: non-centered parametrization.

Theorems & Definitions (70)

Lemma 1
Remark 1
Remark 2
Remark 3
Remark 4
Theorem 1
Remark 5
Corollary 1
Remark 6
Remark 7
...and 60 more

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

TL;DR

Abstract

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (70)