Table of Contents
Fetching ...

Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

Charles C. Margossian, Lawrence K. Saul

TL;DR

The paper extends symmetry-based guarantees for variational inference to a broad class of f-divergences and to targets exhibiting even, elliptical, or partial symmetries, including hierarchical models. It provides theoretical results showing exact mean and (partial) covariance recovery under these symmetries when using location-scale variational families, with stronger conditions holding for KL-type divergences. The authors support the theory with experiments on synthetic targets and Bayesian hierarchical models, illustrating how posterior symmetry influences VI accuracy and offering practical workflow guidance. Overall, the work informs when simple variational families can yield provably accurate summaries and how to diagnose and mitigate asymmetry in VI practice.

Abstract

We extend several recent results providing symmetry-based guarantees for variational inference (VI) with location-scale families. VI approximates a target density $p$ by the best match $q^*$ in a family $Q$ of tractable distributions that in general does not contain $p$. It is known that VI can recover key properties of $p$, such as its mean and correlation matrix, when $p$ and $Q$ exhibit certain symmetries and $q^*$ is found by minimizing the reverse Kullback-Leibler divergence. We extend these guarantees in two important directions. First, we provide symmetry-based guarantees for $f$-divergences, a broad class that includes the reverse and forward Kullback-Leibler divergences and the $α$-divergences. We highlight properties specific to the reverse Kullback-Leibler divergence under which we obtain our strongest guarantees. Second, we obtain further guarantees for VI when the target density $p$ exhibits even and elliptical symmetries in some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry. We illustrate these theoretical results in a number of experimental settings.

Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

TL;DR

The paper extends symmetry-based guarantees for variational inference to a broad class of f-divergences and to targets exhibiting even, elliptical, or partial symmetries, including hierarchical models. It provides theoretical results showing exact mean and (partial) covariance recovery under these symmetries when using location-scale variational families, with stronger conditions holding for KL-type divergences. The authors support the theory with experiments on synthetic targets and Bayesian hierarchical models, illustrating how posterior symmetry influences VI accuracy and offering practical workflow guidance. Overall, the work informs when simple variational families can yield provably accurate summaries and how to diagnose and mitigate asymmetry in VI practice.

Abstract

We extend several recent results providing symmetry-based guarantees for variational inference (VI) with location-scale families. VI approximates a target density by the best match in a family of tractable distributions that in general does not contain . It is known that VI can recover key properties of , such as its mean and correlation matrix, when and exhibit certain symmetries and is found by minimizing the reverse Kullback-Leibler divergence. We extend these guarantees in two important directions. First, we provide symmetry-based guarantees for -divergences, a broad class that includes the reverse and forward Kullback-Leibler divergences and the -divergences. We highlight properties specific to the reverse Kullback-Leibler divergence under which we obtain our strongest guarantees. Second, we obtain further guarantees for VI when the target density exhibits even and elliptical symmetries in some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry. We illustrate these theoretical results in a number of experimental settings.

Paper Structure

This paper contains 26 sections, 9 theorems, 65 equations, 5 figures, 2 tables.

Key Result

Theorem 10

Let $\mathcal{Q}$ be a location family and let $D_f$ be an $f$-divergence. If $p$ is even symmetric about $\mu$, then a stationary point of $D_f(p||q_\nu)$ occurs at $\nu\! =\! \mu$. Furthermore, if $\varphi (v) = f \circ \exp(v)$ is convex and strictly decreasing, and $p$ somewhere-strictly log con

Figures (5)

  • Figure 1: VI Gaussian approximation of an elliptical funnel, obtained by minimizing $\text{KL}(q||p)$. The funnel is asymmetric along $\tau$ but symmetric along $\theta$ and so VI provably recovers the mean and correlations of $\theta$.
  • Figure 2: Variational approximation of a multivariate student-t by a Gaussian. Empirically, for each divergence in Table \ref{['tab:divergences']}, the mean of the student-t is recovered by a factorized Gaussian approximation (left), while its correlation matrix is recovered by a non-factorized Gaussian approximation (right). However, each divergence returns a different estimate of variance. For the $\alpha$-divergence, we use $\alpha=0.5$.
  • Figure 3: VI approximations to a skewed normal $p$ with a Laplace distribution. (Left) When $p$ has no skew $(\kappa\!=\!0)$, its mean is recovered by VI with all the divergences in Table \ref{['tab:divergences']}; when $p$ is largely skewed $(\kappa\!=\!5)$, the results disagree. (Right) The plot shows the error in the mean estimate (averaged over 10 stochastic optimizations).
  • Figure 4: Absolute error in VI's mean estimate scaled by the target's standard deviation. The targets are ordered, bottom to top, from most to least symmetric. The dotted line is the standard error obtained with 100 independent draws. As a trend, VI returns better estimates of the mean for more symmetric targets. The mean is also better estimated in the funnel, crescent, and disease along the coordinates $\sigma$ whose priors exhibit a partial symmetry.
  • Figure 5: Error in VI estimates of the correlation. We split the targets into four groups: synthetic targets and implementations of schools, disease, and SKIM. Within each panel, the models are ordered bottom to top from most symmetric to least symmetric according to eq. \ref{['eq:symmetry']}. For the synthetic targets, we obtain better estimates of the correlation for more symmetric targets. There is no clear pattern for other targets.

Theorems & Definitions (28)

  • Definition 1
  • Definition 2
  • Remark 3
  • Definition 4
  • Remark 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Theorem 10: Exact Recovery of the Mean
  • ...and 18 more