Table of Contents
Fetching ...

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Filippo Ascolani, Gareth O. Roberts, Giacomo Zanella

TL;DR

This work develops a general theory connecting coordinate-wise MCMC convergence to the Gibbs sampler via conditional conductance, enabling dimension-free mixing insights for high-dimensional Bayesian models. By bounding the approximate conductance of MwG updates in terms of the exact Gibbs conductance and analyzing auxiliary perturbation results, the authors establish dimension-free mixing times for MwG in two-level hierarchical structures and related applications. They provide concrete results for MwG with independent MH updates, conditionally log-concave targets, discrete data, binary regression with unknown prior variance, and diffusion data augmentation, showing that fast convergence can be achieved with manageable computational costs. The findings offer practical guidance on when MwG approaches rival gradient-based methods, and they extend the theoretical toolbox for combining MCMC convergence with Bayesian asymptotics in complex, large-scale models.

Abstract

We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

TL;DR

This work develops a general theory connecting coordinate-wise MCMC convergence to the Gibbs sampler via conditional conductance, enabling dimension-free mixing insights for high-dimensional Bayesian models. By bounding the approximate conductance of MwG updates in terms of the exact Gibbs conductance and analyzing auxiliary perturbation results, the authors establish dimension-free mixing times for MwG in two-level hierarchical structures and related applications. They provide concrete results for MwG with independent MH updates, conditionally log-concave targets, discrete data, binary regression with unknown prior variance, and diffusion data augmentation, showing that fast convergence can be achieved with manageable computational costs. The findings offer practical guidance on when MwG approaches rival gradient-based methods, and they extend the theoretical toolbox for combining MCMC convergence with Bayesian asymptotics in complex, large-scale models.

Abstract

We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.
Paper Structure (49 sections, 28 theorems, 201 equations, 5 figures)

This paper contains 49 sections, 28 theorems, 201 equations, 5 figures.

Key Result

Lemma 1

Let $P$ be a $\pi$-reversible and positive semi-definite Markov transition kernel. For every $s \in [0, 1/2)$, $t \geq 0$, $M \geq 1$ and $\mu \in \mathcal{N}\left(\pi, M \right)$, it holds In particular, if $s = \frac{\epsilon}{2M}$ we have

Figures (5)

  • Figure 1: Median of the integrated autocorrelation times multiplied by the average number of likelihood evaluations per iterations (on log-scale) for four MCMC schemes targeting the posterior distribution of model \ref{['eq:one_level_nested_intro']}, as a function of $J$ (number of groups). The median refers to repetitions over datasets randomly generated according to the model with true parameters $\mu^* = \tau^* = 1$. See Section \ref{['sec:discrete_data']} for more details.
  • Figure 2: Median IATs (on the log scale) of four MCMC schemes targeting the posterior distribution of model \ref{['logistic_model_with_covariates']} with $\ell = 5$ and $m=30$, as a function of the number of groups. The median refers to repetitions over datasets randomly generated according to the model with $(\tau^*_1,\dots,\tau^*_5)=(2,1,1,3,2)$ and $\mu_k^*\stackrel{iid}\sim \hbox{Unif}([-1,1])$ for every $k = 1,\dots, 5$.
  • Figure 3: Median IATs of five MCMC schemes targeting the posterior distribution of model \ref{['eq:one_level_nested_intro']}, as a function of the number of covariates (intercept included). The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model_with_covariates']} with $\tau_k^* = 0.5$ and $\mu_k^*\stackrel{iid}\sim \hbox{Unif}([-1,1])$ for every $k$. Some points for IMH and RWM are omitted due to the difficulty of appropriately estimating high values of IATs.
  • Figure 4: Median IATs (on the log scale) of three MCMC schemes targeting the posterior distribution of model \ref{['logistic_model']} with $d = 5$, as a function of the number of observations $n$. The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model']} with $\alpha^* = 1$ and $\Sigma = \frac{1}{5}\mathbb{I}$.
  • Figure 5: Median IATs of four MCMC schemes targeting the posterior distribution of model \ref{['logistic_model']}, as a function of the number of covariates $d$ and $n = d/2$. The median refers to repetitions over datasets randomly generated according to model \ref{['logistic_model']} with $\alpha^* = 1$ and $\Sigma = \frac{1}{d}\mathbb{I}$. Full lines: centered parametrization. Dotted lines: non-centered parametrization.

Theorems & Definitions (70)

  • Lemma 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 1
  • Remark 5
  • Corollary 1
  • Remark 6
  • Remark 7
  • ...and 60 more