Table of Contents
Fetching ...

Skew-symmetric approximations of posterior distributions

Francesco Pozza, Daniele Durante, Botond Szabo

TL;DR

The paper addresses the bias introduced by symmetric posterior approximations in Bayesian inference by introducing a general, computationally light skew-symmetric perturbation that can be applied to any existing symmetric approximation (e.g., Laplace, VB, EP). It proves a skewing factor exists and is optimal within the skew-symmetric class, yielding provable improvements in accuracy across TV, KL, reverse-KL, and alpha-divergences, with rates that can beat the standard Gaussian benchmarks asymptotically. The approach preserves tractability through a closed-form skewing factor and a simple sampling scheme, and empirical results on simulated and real data demonstrate substantial finite-sample and high-dimensional gains, including large ESS improvements in importance sampling. This work broadens the practical toolkit for deterministic posterior approximations, offering a broadly applicable, theory-grounded method to capture posterior skewness without incurring heavy computational costs. The proposed framework also opens avenues to integrate skewness into existing VB/EP optimization schemes and to extend the ideas to higher-order or alternative symmetric families.

Abstract

Popular deterministic approximations of posterior distributions from, e.g. the Laplace method, variational Bayes and expectation-propagation, generally rely on symmetric approximating families, often taken to be Gaussian. This choice facilitates optimization and inference, but typically affects the quality of the overall approximation. In fact, even in basic parametric models, the posterior distribution often displays asymmetries that yield bias and a reduced accuracy when considering symmetric approximations. Recent research has moved towards more flexible approximating families which incorporate skewness. However, current solutions are often model specific, lack a general supporting theory, increase the computational complexity of the optimization problem, and do not provide a broadly applicable solution to incorporate skewness in any symmetric approximation. This article addresses such a gap by introducing a general and provably optimal strategy to perturb any off-the-shelf symmetric approximation of a generic posterior distribution. This novel perturbation scheme is derived without additional optimization steps, and yields a similarly tractable approximation within the class of skew-symmetric densities that provably enhances the finite sample accuracy of the original symmetric counterpart. Furthermore, under suitable assumptions, it improves the convergence rate to the exact posterior by at least a $\sqrt{n}$ factor, in asymptotic regimes. These advancements are illustrated in numerical studies focusing on skewed perturbations of state-of-the-art Gaussian approximations.

Skew-symmetric approximations of posterior distributions

TL;DR

The paper addresses the bias introduced by symmetric posterior approximations in Bayesian inference by introducing a general, computationally light skew-symmetric perturbation that can be applied to any existing symmetric approximation (e.g., Laplace, VB, EP). It proves a skewing factor exists and is optimal within the skew-symmetric class, yielding provable improvements in accuracy across TV, KL, reverse-KL, and alpha-divergences, with rates that can beat the standard Gaussian benchmarks asymptotically. The approach preserves tractability through a closed-form skewing factor and a simple sampling scheme, and empirical results on simulated and real data demonstrate substantial finite-sample and high-dimensional gains, including large ESS improvements in importance sampling. This work broadens the practical toolkit for deterministic posterior approximations, offering a broadly applicable, theory-grounded method to capture posterior skewness without incurring heavy computational costs. The proposed framework also opens avenues to integrate skewness into existing VB/EP optimization schemes and to extend the ideas to higher-order or alternative symmetric families.

Abstract

Popular deterministic approximations of posterior distributions from, e.g. the Laplace method, variational Bayes and expectation-propagation, generally rely on symmetric approximating families, often taken to be Gaussian. This choice facilitates optimization and inference, but typically affects the quality of the overall approximation. In fact, even in basic parametric models, the posterior distribution often displays asymmetries that yield bias and a reduced accuracy when considering symmetric approximations. Recent research has moved towards more flexible approximating families which incorporate skewness. However, current solutions are often model specific, lack a general supporting theory, increase the computational complexity of the optimization problem, and do not provide a broadly applicable solution to incorporate skewness in any symmetric approximation. This article addresses such a gap by introducing a general and provably optimal strategy to perturb any off-the-shelf symmetric approximation of a generic posterior distribution. This novel perturbation scheme is derived without additional optimization steps, and yields a similarly tractable approximation within the class of skew-symmetric densities that provably enhances the finite sample accuracy of the original symmetric counterpart. Furthermore, under suitable assumptions, it improves the convergence rate to the exact posterior by at least a factor, in asymptotic regimes. These advancements are illustrated in numerical studies focusing on skewed perturbations of state-of-the-art Gaussian approximations.
Paper Structure (25 sections, 2 theorems, 84 equations, 1 figure, 3 tables, 2 algorithms)

This paper contains 25 sections, 2 theorems, 84 equations, 1 figure, 3 tables, 2 algorithms.

Key Result

Lemma B.1

Let $K_n = \{ {\boldsymbol \theta} \in \Theta \, : \, \| {\boldsymbol \theta} - {\boldsymbol \theta} _0\| < M_n \sqrt{d/n}\}$. Then, under Assumptions cond:4, cond:m1 and cond:m3, we have, for $c_0>0$ sufficiently large (not depending on $n$ and $d$) and $M_n = \sqrt{c_0 \log n}$, that where $K_n^c$ denotes the complement of $K_n$.

Figures (1)

  • Figure 1: Empirical comparison of the accuracy achieved by three state-of-the-art Gaussian approximations from the Laplace method, black-box VB and EP, versus the corresponding skew-symmetric perturbations. For three routinely-employed divergences $\mathcal{D}$ (TV, KL, reverse-KL) the first three panels display the boxplots of $\mathcal{D}(\pi_{j,n} \mid \mid q_{j,n})$, $j=1, \ldots, 62$, where $q_{j,n}$ is the $j$th marginal of either $\bar{q}_{n,{ {\boldsymbol \theta} }^*}$ (Gaussian) or ${q}_{n,{ {\boldsymbol \theta} }^*}$ (Skew-symmetric). The fourth panel shows instead the boxplot of the absolute differences between the approximated and actual posterior means of the $d=62$ standardized parameters (standardization proceeds as discussed in the caption of Table \ref{['tab_high_marg_0']}).

Theorems & Definitions (8)

  • proof
  • proof
  • proof
  • proof
  • Lemma B.1: Posterior contraction
  • proof
  • Lemma B.2
  • proof