Table of Contents
Fetching ...

On micromodes in Bayesian posterior distributions and their implications for MCMC

Sanket Agrawal, Sebastiano Grazzi, Gareth O. Roberts

TL;DR

This paper reveals that heavy-tailed data induce micromodes—local posterior peaks near isolated extreme observations—in Bayesian posteriors for high-dimensional location models. It connects posterior geometry to ZZP-based MCMC performance, deriving an Arrhenius-type exit law and a tail-mismatch–driven phase transition that can severely degrade sampling when the tail of the model is mis-specified. Central to the analysis are precise characterizations of micromode location and width via extreme-value theory and a high-precision empirical-score approximation, enabling tractable analysis of exit times. The results highlight practical cautions for robust Bayesian modeling: allowing sufficiently heavy tails can mitigate micromode effects, while underdispersed tails can trap samplers and substantially slow convergence. The work thus links posterior landscape geometry with computational dynamics in a principled way, with potential extensions to other MCMC algorithms and more complex regression settings.

Abstract

We investigate the existence and severity of local modes in posterior distributions from Bayesian analyses. These are known to occur in posterior tails resulting from heavy-tailed error models such as those used in robust regression. To understand this phenomenon clearly, we consider in detail location models with Student-$t$ errors in dimension $d$ with sample size $n$. For sufficiently heavy-tailed data-generating distributions, extreme observations become increasingly isolated as $n \to \infty$. We show that each such observation induces a unique local posterior mode with probability tending to $1$. We refer to such a local mode as a micromode. These micromodes are typically small in height but their domains of attraction are large and grow polynomially with $n$. We then connect this posterior geometry to computation. We establish an Arrhenius law for the time taken by one-dimensional piecewise deterministic Monte Carlo algorithms to exit these micromodes. Our analysis identifies a phase transition where a misspecified and overly underdispersed model causes exit times to increase sharply, leading to a pronounced deterioration in sampling performance.

On micromodes in Bayesian posterior distributions and their implications for MCMC

TL;DR

This paper reveals that heavy-tailed data induce micromodes—local posterior peaks near isolated extreme observations—in Bayesian posteriors for high-dimensional location models. It connects posterior geometry to ZZP-based MCMC performance, deriving an Arrhenius-type exit law and a tail-mismatch–driven phase transition that can severely degrade sampling when the tail of the model is mis-specified. Central to the analysis are precise characterizations of micromode location and width via extreme-value theory and a high-precision empirical-score approximation, enabling tractable analysis of exit times. The results highlight practical cautions for robust Bayesian modeling: allowing sufficiently heavy tails can mitigate micromode effects, while underdispersed tails can trap samplers and substantially slow convergence. The work thus links posterior landscape geometry with computational dynamics in a principled way, with potential extensions to other MCMC algorithms and more complex regression settings.

Abstract

We investigate the existence and severity of local modes in posterior distributions from Bayesian analyses. These are known to occur in posterior tails resulting from heavy-tailed error models such as those used in robust regression. To understand this phenomenon clearly, we consider in detail location models with Student- errors in dimension with sample size . For sufficiently heavy-tailed data-generating distributions, extreme observations become increasingly isolated as . We show that each such observation induces a unique local posterior mode with probability tending to . We refer to such a local mode as a micromode. These micromodes are typically small in height but their domains of attraction are large and grow polynomially with . We then connect this posterior geometry to computation. We establish an Arrhenius law for the time taken by one-dimensional piecewise deterministic Monte Carlo algorithms to exit these micromodes. Our analysis identifies a phase transition where a misspecified and overly underdispersed model causes exit times to increase sharply, leading to a pronounced deterioration in sampling performance.
Paper Structure (19 sections, 22 theorems, 201 equations, 4 figures)

This paper contains 19 sections, 22 theorems, 201 equations, 4 figures.

Key Result

Proposition 2.1

Suppose $P$ satisfies Assumption assum:density for some $\beta > 0$. Then, for each fixed $k \ge 0$, there exists a $G_k$ which follows a Gamma distribution with shape parameter $k+1$ and unit scale, such that as $n \to \infty$, where $A = \frac{2\pi^{d/2}K(\beta, d)}{\beta\ \Gamma(d/2)}$ and $\Gamma(\cdot)$ denotes the gamma function. Moreover, for all $0 < \epsilon < 1/\beta$, as $n \to \infty$

Figures (4)

  • Figure 1: Isodensity contours of 2-dimensional posterior distributions with $n=10^5$ datapoints. Top panels for the model in \ref{['eq: linear model']} with $d=2$. Blue and red dots for the datapoints. Bottom panels for the linear model $y_j = A_jx + \varepsilon_j$, $d = 2$ covariates and $n = 10^5$ datapoints. Blue and green dots for the mini-modes found along the lines $y_{i} = A_{i}x$ where $y_i$ is the largest and second largest observation. For both models, $\varepsilon_j \sim t_1$ and $\nu = 1$. Right panels: zoom-in near micromodes.
  • Figure 2: The widths of local maxima at $x_1$ and $x_2$ for some arbitrary function $f$.
  • Figure 3: Time to leave the micromode near $Y_{(n)}$ normalized by $|Y_{(n)}|$ for canonical ZZP targeting $\pi_\nu$ and started in the micromode near $Y_{(n)}$, for different values of $\nu$ ($x$-axis). Here, $n = 3000$ and $\beta = 1/2$. Left: each boxplot was computed with 40 independent trajectories. Right: Average time to leave the micromode computed over 40 independent trajectories targeting $\pi_\nu$, for different $\nu$ ($x$-axis). Dashed line at the phase transition $\beta/(1 - \beta) = 1$.
  • Figure 4: A diagram showing $S_n(x)$ and the construction of $d_n^{+}$ and $d_n^{-}$ in one dimension. The lengths $d_n^{\pm}$ are chosen such that $vS_n(Y_{(n)} + vt) > 0$ for all $t \in [d_n^{+}, d_n^{-}]$ and $v = \pm 1$. The right panel zooms in on the region where $S_n$ crosses the $x-$axis.

Theorems & Definitions (48)

  • Proposition 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Lemma 3.1
  • Remark 3.1
  • Remark 3.2
  • Theorem 3.2
  • Remark 3.3
  • ...and 38 more