Table of Contents
Fetching ...

Normal approximation for the posterior in exponential families

Adrian Fischer, Robert E. Gaunt, Gesine Reinert, Yvik Swan

TL;DR

This work provides non-asymptotic Bernstein–von Mises type bounds for the normal approximation of posteriors in exponential-family models using Stein's method, with explicit total variation and Wasserstein bounds that apply to both univariate and multivariate posteriors and accommodate non-conjugate priors. By allowing arbitrary centering and scaling and leveraging a third-order (and optionally fourth-order) Taylor expansion of the log-likelihood, the authors derive bounds that depend explicitly on priors and sufficient statistics, and they demonstrate faster-than-$O(n^{-1/2})$ rates in certain settings. The framework is illustrated across a range of exponential-family models—Bernoulli, Poisson, normal-with-known-mean, Weibull, multinomial, univariate normal with unknown mean/variance, and linear regression with unknown variance—highlighting how dimension, standardisation, and priors influence Gaussian proximity and convergence rates. The results furnish practical, finite-sample guarantees for Laplace-type approximations in Bayesian inference and offer new insights into high-dimensional posterior behavior.

Abstract

In this paper, we obtain quantitative, non-asymptotic, and data-dependent \textit{Bernstein-von Mises type} bounds on the normal approximation of the posterior distribution in exponential family models with arbitrary centring and scaling. Our bounds, stated in the total variation and Wasserstein distances, are valid for univariate and multivariate posteriors alike, and do not require a conjugate prior setting. They are obtained through a refined version of Stein's method of comparison of operators that allows for improved dimensional dependence in high-dimensional settings and may also be of interest in other problems. Our approach is rather flexible and, in certain settings, allows for the derivation of bounds with rates of convergence faster than the usual \( O(n^{-1/2}) \) rate (when \( n \) is the sample size). We illustrate our findings on a variety of exponential family distributions, including the Weibull, multinomial, and linear regression with unknown variance. The resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors affect the quality of the normal approximation. Insights from our examples include identification of conditions under which faster \( O(n^{-1}) \) convergence rates occur for Bernoulli data, illustrations of how the quality of the normal approximation is influenced by the choice of standardisation, and dimensional dependence in high-dimensional settings.

Normal approximation for the posterior in exponential families

TL;DR

This work provides non-asymptotic Bernstein–von Mises type bounds for the normal approximation of posteriors in exponential-family models using Stein's method, with explicit total variation and Wasserstein bounds that apply to both univariate and multivariate posteriors and accommodate non-conjugate priors. By allowing arbitrary centering and scaling and leveraging a third-order (and optionally fourth-order) Taylor expansion of the log-likelihood, the authors derive bounds that depend explicitly on priors and sufficient statistics, and they demonstrate faster-than- rates in certain settings. The framework is illustrated across a range of exponential-family models—Bernoulli, Poisson, normal-with-known-mean, Weibull, multinomial, univariate normal with unknown mean/variance, and linear regression with unknown variance—highlighting how dimension, standardisation, and priors influence Gaussian proximity and convergence rates. The results furnish practical, finite-sample guarantees for Laplace-type approximations in Bayesian inference and offer new insights into high-dimensional posterior behavior.

Abstract

In this paper, we obtain quantitative, non-asymptotic, and data-dependent \textit{Bernstein-von Mises type} bounds on the normal approximation of the posterior distribution in exponential family models with arbitrary centring and scaling. Our bounds, stated in the total variation and Wasserstein distances, are valid for univariate and multivariate posteriors alike, and do not require a conjugate prior setting. They are obtained through a refined version of Stein's method of comparison of operators that allows for improved dimensional dependence in high-dimensional settings and may also be of interest in other problems. Our approach is rather flexible and, in certain settings, allows for the derivation of bounds with rates of convergence faster than the usual \( O(n^{-1/2}) \) rate (when is the sample size). We illustrate our findings on a variety of exponential family distributions, including the Weibull, multinomial, and linear regression with unknown variance. The resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors affect the quality of the normal approximation. Insights from our examples include identification of conditions under which faster \( O(n^{-1}) \) convergence rates occur for Bernoulli data, illustrations of how the quality of the normal approximation is influenced by the choice of standardisation, and dimensional dependence in high-dimensional settings.
Paper Structure (21 sections, 18 theorems, 212 equations, 3 figures)

This paper contains 21 sections, 18 theorems, 212 equations, 3 figures.

Key Result

Lemma 1

Let $N \sim \gamma_d$ and $X \sim p^X$ be as above. Let $\phi : \mathbb{R}^d \to \mathbb{R}$ be such that there exists a twice differentiable solution $f_{\phi}$ to the Stein equation mvnsteineqn which, moreover, satisfies $f_{\phi} \in \mathcal{F}(X)$. Then

Figures (3)

  • Figure 1: This figure illustrates the results from Example \ref{['example1']} (Bernoulli data) using MAP centring and scaling with a conjugate prior. Orange curves: True values of the Wasserstein distance, obtained numerically. Blue curves: Lower bound of the Wasserstein distance, as given by equation \ref{['eq:lowww']}. Red curves: Upper bound of the Wasserstein distance, as given by equation \ref{['eq:upuD3v111']}. Green curves: Additional bound, computed using equation \ref{['eq:boundwd3']}. In panels (a) to (e), the values are scaled by multiplying them by the square root of the sample sizes for the specified values of $p^\star$. In panel (f), the same data as in panel (e) is presented, but the values are instead multiplied by the sample size. Each curve represents the average of 10 simulations performed for sample sizes $n \in \{100, 120, \dots, 500\}$, all derived from the same Bernoulli dataset. The prior parameters are arbitrarily fixed at $\tau_1 = 0.84$ and $\tau_2 = \sqrt{2}$. In many panels, the blue curves overlap with the orange curves, and the green curves overlap with the red curves, making the blue and green curves largely invisible.
  • Figure 2: Exactly the same configuration as in Figure \ref{['fig:binomial']}, but with MLE centring and scaling using a conjugate prior. The values are multiplied by the sample size. Blue curves: True values of the Wasserstein distance. Orange curves: Bound as defined in equation \ref{['eq:boundmlegen']}. Green curves: Bound as given in equation \ref{['eq:newbound4']}. All curves are multiplied by the sample size. In the left panel (panel (a)), the parameters are set to $\tau_1 = 0.84$ and $\tau_2 = \sqrt{2}$. In the right panel (panel (b)), the parameters are set to $\tau_1 = 1$ and $\tau_2 = 2$. In both panels, $p^\star = 0.5$.
  • Figure 3: The blue curves represent the true values of the Wasserstein distances (left column) and total variation distances (right column) multiplied by the square root of the sample sizes; the total variation distance curve is further multiplied by a factor of 7.5. The orange curves give the upper bounds, computed via equation \ref{['eq:wasspoi']} (left column) and equation \ref{['eq:TVBOUNDPOISS']} (right column), also multiplied by the square root of the sample size. The orange and blue curves are, obviously, indistinguishable in the left panel. All curves are computed for the same sample of Poisson data with parameter $\lambda = 10$, for sample sizes $n \in \{100, 120, \dots, 500\}$. The prior parameters are set to $\tau_1 = 1$ and $\tau_2 = 3$.

Theorems & Definitions (51)

  • Example 1: Weibull data
  • Example 2: Categorical data
  • Remark 1
  • Example 3: Standardisation around the MAP
  • Example 4: Standardisation around the MLE
  • Lemma 1: Bound on differences of expectations
  • proof
  • Lemma 2: Bounds in dimension $d=1$
  • Lemma 3: Lipschitz test functions
  • proof
  • ...and 41 more