Table of Contents
Fetching ...

Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix

Charles C. Margossian, Lawrence K. Saul

TL;DR

This work establishes symmetry-based guarantees for variational inference within location-scale families. It proves that VI can exactly recover the mean of a target density when the target exhibits even symmetry and the variational base is even, and can exactly recover the correlation matrix when the target has elliptical symmetry and the base is spherically symmetric, even under misspecification. The results cover nontrivial misspecifications (e.g., factorized $q$ with non-factorized $p$ and mismatched tails) and are supported by illustrative examples and numerical experiments across synthetic and real Bayesian posteriors. Together, the theory and experiments illuminate when VI is reliable for recovering key population moments and offer guidance for diagnostics and preconditioning in practice.

Abstract

Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $\text{KL}(q||p)$. In practice, $Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $\text{KL}(q||p)$. In this paper, we analyze the robustness of VI to these misspecifications when $p$ exhibits certain symmetries and $Q$ is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of $p$ when $p$ exhibits an \textit{even} symmetry, and (ii) it recovers the correlation matrix of $p$ when in addition~$p$ exhibits an \textit{elliptical} symmetry. These guarantees hold for the mean even when $q$ is factorized and $p$ is not, and for the correlation matrix even when~$q$ and~$p$ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.

Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix

TL;DR

This work establishes symmetry-based guarantees for variational inference within location-scale families. It proves that VI can exactly recover the mean of a target density when the target exhibits even symmetry and the variational base is even, and can exactly recover the correlation matrix when the target has elliptical symmetry and the base is spherically symmetric, even under misspecification. The results cover nontrivial misspecifications (e.g., factorized with non-factorized and mismatched tails) and are supported by illustrative examples and numerical experiments across synthetic and real Bayesian posteriors. Together, the theory and experiments illuminate when VI is reliable for recovering key population moments and offer guidance for diagnostics and preconditioning in practice.

Abstract

Given an intractable target density , variational inference (VI) attempts to find the best approximation from a tractable family . This is typically done by minimizing the exclusive Kullback-Leibler divergence, . In practice, is not rich enough to contain , and the approximation is misspecified even when it is a unique global minimizer of . In this paper, we analyze the robustness of VI to these misspecifications when exhibits certain symmetries and is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of when exhibits an \textit{even} symmetry, and (ii) it recovers the correlation matrix of when in addition~ exhibits an \textit{elliptical} symmetry. These guarantees hold for the mean even when is factorized and is not, and for the correlation matrix even when~ and~ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.

Paper Structure

This paper contains 29 sections, 5 theorems, 62 equations, 10 figures, 1 table.

Key Result

Theorem 8

Let $\mathcal{Q}$ be a location family whose base distribution $q_0$ is even-symmetric about the origin. If $p$ is even-symmetric about $\mu$, then $\text{KL}(q_\nu||p)$ has a stationary point at $\nu\!=\!\mu$; furthermore, if $\log p$ is concave on $\mathbb{R}^d$ and strictly concave on some open s

Figures (10)

  • Figure 1: Posterior distribution of $\beta_1$ for a Bayesian logistic regression with $N$ examples. Vertical lines indicate the means estimated by MCMC and VI. These estimates match when the posterior is symmetric $(N\!=\!0,128)$ and differ when it is not $(N\!=\!4)$.
  • Figure 2: Robustness of VI to misspecifications. Left: VI with a factorized Gaussian, $q$, exactly recovers the mean of a multivariate student-t, $p$. Right: VI with a univariate Gaussian recovers the point of symmetry in target densities with different tails. This point equals the mean when the target is Laplace or student-t, but not when it is Cauchy (whose mean does not exist).
  • Figure 3: Approximating the mixture $p$ of two normals in eq. (\ref{['eq:mixture']}) when the modes are close ($m\!=\!1$) or well separated ($m\!=\!10$). Left: probability densities. Right: KL divergence between $p$ and a single normal $q_\nu$ with mean $\nu$. Per Theorem \ref{['thm:location']}, $\text{KL}(q_\nu||p)$ has a stationary point at $\nu\!=\!0$. This stationary point is a minimizer when $m\!=\!1$ but a local maximizer when $m\!=\!10$.
  • Figure 4: VI approximation of a skewed normal distribution, $p$, by a Laplace distribution, $q$. The vertical lines indicate the mean of each distribution. Per Theorem \ref{['thm:location']}, VI correctly estimates the mean of $p$ when $\alpha\!=\!0$ and $p$ is symmetric. However, VI's estimate worsens as $\alpha$ increases and $p$ becomes less symmetric.
  • Figure 5: Gaussian approximation of VI to a multivariate student-t with varying degrees of freedom, $k$. Left: VI's scale matrix $S$ equals the target's scale matrix $M$, up to a multiplicative constant, which varies with $k$. Right: for all $k$, VI exactly recovers the elements $\rho_{ij}$ of the correlation matrix.
  • ...and 5 more figures

Theorems & Definitions (18)

  • Definition 1
  • Remark 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Remark 7
  • Theorem 8: Exact Recovery of Mean
  • proof
  • Remark 9
  • ...and 8 more