Table of Contents
Fetching ...

A theoretical framework for M-posteriors: frequentist guarantees and robustness properties

Juraj Marusic, Marco Avella Medina, Cynthia Rush

TL;DR

The paper develops M-posteriors, a broad class of generalized posteriors tied to M-estimators, and proves a Bernstein–von Mises type asymptotic normality under weighted empirical measures. It introduces two model-agnostic robustness tools—the posterior influence function and the posterior breakdown point—relating their behavior to the score function and loss choices, and proposes a bias-corrected loss to restore Fisher consistency when necessary. The framework is instantiated through motivating examples (Huber location, Bayesian quantile regression, Bayesian data reweighting) and validated with numerical experiments in normal, mixture, and Poisson-factorization settings, highlighting practical robustness gains. The results provide a principled, prior-aware approach to robust Bayesian inference with broad applicability and explicit guidance on when and how robustness may come at the cost of bias, along with actionable connections to existing robust statistics concepts.

Abstract

We provide a theoretical framework for a wide class of generalized posteriors that can be viewed as the natural Bayesian posterior counterpart of the class of M-estimators in the frequentist world. We call the members of this class M-posteriors and show that they are asymptotically normally distributed under mild conditions on the M-estimation loss and the prior. In particular, an M-posterior contracts in probability around a normal distribution centered at an M-estimator, showing frequentist consistency and suggesting some degree of robustness depending on the reference M-estimator. We formalize the robustness properties of the M-posteriors by a new characterization of the posterior influence function and a novel definition of breakdown point adapted for posterior distributions. We illustrate the wide applicability of our theory in various popular models and illustrate their empirical relevance in some numerical examples.

A theoretical framework for M-posteriors: frequentist guarantees and robustness properties

TL;DR

The paper develops M-posteriors, a broad class of generalized posteriors tied to M-estimators, and proves a Bernstein–von Mises type asymptotic normality under weighted empirical measures. It introduces two model-agnostic robustness tools—the posterior influence function and the posterior breakdown point—relating their behavior to the score function and loss choices, and proposes a bias-corrected loss to restore Fisher consistency when necessary. The framework is instantiated through motivating examples (Huber location, Bayesian quantile regression, Bayesian data reweighting) and validated with numerical experiments in normal, mixture, and Poisson-factorization settings, highlighting practical robustness gains. The results provide a principled, prior-aware approach to robust Bayesian inference with broad applicability and explicit guidance on when and how robustness may come at the cost of bias, along with actionable connections to existing robust statistics concepts.

Abstract

We provide a theoretical framework for a wide class of generalized posteriors that can be viewed as the natural Bayesian posterior counterpart of the class of M-estimators in the frequentist world. We call the members of this class M-posteriors and show that they are asymptotically normally distributed under mild conditions on the M-estimation loss and the prior. In particular, an M-posterior contracts in probability around a normal distribution centered at an M-estimator, showing frequentist consistency and suggesting some degree of robustness depending on the reference M-estimator. We formalize the robustness properties of the M-posteriors by a new characterization of the posterior influence function and a novel definition of breakdown point adapted for posterior distributions. We illustrate the wide applicability of our theory in various popular models and illustrate their empirical relevance in some numerical examples.

Paper Structure

This paper contains 44 sections, 21 theorems, 205 equations, 4 figures, 1 table.

Key Result

Theorem 1

Let $\boldsymbol{\alpha} \equiv (\alpha_i)_{i = 1}^{\infty}$ be a sequence of positive (constant) weights with finite second moment. Suppose that the prior density $\pi$ is continuous and positive on a neighborhood around the true parameter $\theta^*$. Letting $d_{\mathrm{TV}}(\cdot, \cdot)$ denote in $P_0$-probability, where $V_{\theta^*}$ is the positive definite matrix satisfying Assumption as

Figures (4)

  • Figure 1: Comparison of original vs. bias–corrected M-estimators and M-posteriors. The left panel traces the M–estimator $\hat{\theta}_\rho$ as the sample size $n$ increases, showing that under the uncorrected loss the estimator converges to a value well above the true rate $\theta^*=1$, whereas the bias–corrected estimator rapidly stabilizes at the correct value. The right panel displays Metropolis–Hastings draws from the corresponding M-posteriors at $n=1000$: the original M-posterior is concentrated around the same incorrect mode, while the bias–corrected M-posterior centers on $\theta=1$. Taken together, these plots demonstrate that removing the asymptotic bias from the estimating equations restores posterior consistency in the Bayesian framework.
  • Figure 2: Density plots of the M-posterior for the location parameter $\theta$ using a Laplace likelihood, under three representative priors (rows: improper, exponential and Gaussian) and three contamination levels (columns: 50%, 70%, 100%). The blue-shaded curves show the M-posteriors fitted on the original non-corrupted sample, while red-shaded curves correspond to the M-posteriors after shifting a fraction of observations by 50% or more. This figure illustrates the implications of \ref{['thm::breakdown-point-convex-loss']}: the posterior breakdown point for uninformative priors is $1/2$, for the exponential prior it can exceed $1/2$ and for the Gaussian prior it does not exist.
  • Figure 3: Comparison of PIF between standard Gaussian likelihood (red) and Huber loss (blue), computed on a sample of size $n=100$ with Huber threshold $c=1$. (Left) PIF as a function of contamination point $x_0$, holding $\theta=0.1$ fixed; (Right) PIF as a function of parameter $\theta$, holding $x_0=2.0$ fixed.
  • Figure 4: Side‐by‐side comparison of Dirichlet‐process Gaussian mixture fits on the same skewed, three‐cluster dataset. (Left) Data colored by true cluster labels. (Center) Posterior under a robust Huber loss ($c=1.0$), with active components outlined by shaded $2\sigma$ ellipsoids and their centers marked by “$\times$.” (Right) Posterior under the standard Gaussian likelihood, using the same visual conventions.

Theorems & Definitions (63)

  • Definition 1
  • Definition 2
  • Definition 3: Weighted empirical distribution function
  • Definition 4: Weighted M-posterior
  • Theorem 1
  • Remark 1
  • Example 1: Huber location posterior
  • Example 2: Reweighted posterior: Exponential model
  • Example 3: (continued) Reweighted posterior: Exponential model
  • Definition 5: Posterior influence function
  • ...and 53 more