Table of Contents
Fetching ...

Weak convergence of Bayes estimators under general loss functions

Robin Requadt, Housen Li, Axel Munk

TL;DR

This work extends Bernstein--von Mises theory to parametric Bayes estimators under broad, non-translation-invariant losses that exhibit locally polynomial behavior, including $W_2$, Sinkhorn divergences, and Stein discrepancies. It provides a unified framework to establish consistency and asymptotic normality (and, for locally quadratic losses, asymptotic efficiency) of Bayes estimators, under mild regularity and differentiability conditions; it also derives differentiability results for Wasserstein-induced losses and offers concrete numerical demonstrations. The results cover a range of intrinsic and discrepancy-based losses, with applications to exponential–gamma and multinomial–Dirichlet models, highlighting the practical viability of geometry- and divergence-based losses in Bayesian inference. The theory clarifies when Bayesian procedures achieve asymptotic optimality under non-standard losses and guides implementation via Wasserstein barycenters and related metrics.

Abstract

We investigate the asymptotic behavior of parametric Bayes estimators under a broad class of loss functions that extend beyond the classical translation-invariant setting. To this end, we develop a unified theoretical framework for loss functions exhibiting locally polynomial structure. This general theory encompasses important examples such as the squared Wasserstein distance, the Sinkhorn divergence and Stein discrepancies, which have gained prominence in modern statistical inference and machine learning. Building on the classical Bernstein--von Mises theorem, we establish sufficient conditions under which Bayes estimators inherit the posterior's asymptotic normality. As a by-product, we also derive conditions for the differentiability of Wasserstein-induced loss functions and provide new consistency results for Bayes estimators. Several examples and numerical experiments demonstrate the relevance and accuracy of the proposed methodology.

Weak convergence of Bayes estimators under general loss functions

TL;DR

This work extends Bernstein--von Mises theory to parametric Bayes estimators under broad, non-translation-invariant losses that exhibit locally polynomial behavior, including , Sinkhorn divergences, and Stein discrepancies. It provides a unified framework to establish consistency and asymptotic normality (and, for locally quadratic losses, asymptotic efficiency) of Bayes estimators, under mild regularity and differentiability conditions; it also derives differentiability results for Wasserstein-induced losses and offers concrete numerical demonstrations. The results cover a range of intrinsic and discrepancy-based losses, with applications to exponential–gamma and multinomial–Dirichlet models, highlighting the practical viability of geometry- and divergence-based losses in Bayesian inference. The theory clarifies when Bayesian procedures achieve asymptotic optimality under non-standard losses and guides implementation via Wasserstein barycenters and related metrics.

Abstract

We investigate the asymptotic behavior of parametric Bayes estimators under a broad class of loss functions that extend beyond the classical translation-invariant setting. To this end, we develop a unified theoretical framework for loss functions exhibiting locally polynomial structure. This general theory encompasses important examples such as the squared Wasserstein distance, the Sinkhorn divergence and Stein discrepancies, which have gained prominence in modern statistical inference and machine learning. Building on the classical Bernstein--von Mises theorem, we establish sufficient conditions under which Bayes estimators inherit the posterior's asymptotic normality. As a by-product, we also derive conditions for the differentiability of Wasserstein-induced loss functions and provide new consistency results for Bayes estimators. Several examples and numerical experiments demonstrate the relevance and accuracy of the proposed methodology.

Paper Structure

This paper contains 25 sections, 15 theorems, 227 equations, 4 figures.

Key Result

Theorem 1

[theorem]B.v.M: Let $\{P_\vartheta:\vartheta\in \Theta\subseteq \mathbb{R}^d\}$ be a parametric family of probability measures, each absolutely continuous with respect to a common $\sigma$-finite measure $\mu$ on $\mathcal{X}$, and assume this family is differentiable in quadratic mean at $\vartheta Define $\Delta_{n,\vartheta_0}({\boldsymbol X}):=I_{\vartheta_0}^{-1}n^{-1/2}\sum_{i=1}^{n}\ell'_{\

Figures (4)

  • Figure 1: Contour plots of mappings $(a,b)\mapsto W_1(P_a,P_b)$ (left) and $(a,b)\mapsto \left\lvert a-b \right\rvert$ (right).
  • Figure 2: QQ-plots for $n^{1/2}(\hat{\theta}_n-\vartheta_{0})$ for $n\in\{10,10^2,10^4,10^6\}$ (left to right) based on $M=500$ Monte Carlo repetitions, where $\vartheta_{0}=2$ and $\hat{\theta}_n$ is the Bayes estimator under losses $\ell_{H}$, $\ell_{W_2}$ and $\ell_{\mathrm{KL}}$ (top to bottom). The reference corresponds to the standard normal distribution.
  • Figure 3: 2-Wasserstein distance between the empirical distribution of $n^{1/2}(\hat{\theta}_n-\vartheta_0)$ and $\mathcal{N}_{d-1}(0,I_{\vartheta_0}^{-1})$ for $d=3$, $\alpha=(1,1,1)$ and $\vartheta_0=(1/3,1/3)$. Each point is based on $R=100$ Monte Carlo repetitions of calculating the Wasserstein distance based on $M=2000$ samples.
  • Figure 4: Plot of $(t_1,t_2)\mapsto Z(t_1,t_2)$ for $Y=(1,1)$. The green arrow visualizes the vector $(1,1)$.

Theorems & Definitions (35)

  • Theorem 1: BvM theorem; v
  • Proposition 1
  • Definition 1: Exponential growth
  • Proposition 2: Consistency
  • Theorem 2: Weak convergence
  • Example 1
  • Lemma 1
  • Proposition 3
  • Theorem 3
  • Example 2
  • ...and 25 more