Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix
Charles C. Margossian, Lawrence K. Saul
TL;DR
This work establishes symmetry-based guarantees for variational inference within location-scale families. It proves that VI can exactly recover the mean of a target density when the target exhibits even symmetry and the variational base is even, and can exactly recover the correlation matrix when the target has elliptical symmetry and the base is spherically symmetric, even under misspecification. The results cover nontrivial misspecifications (e.g., factorized $q$ with non-factorized $p$ and mismatched tails) and are supported by illustrative examples and numerical experiments across synthetic and real Bayesian posteriors. Together, the theory and experiments illuminate when VI is reliable for recovering key population moments and offer guidance for diagnostics and preconditioning in practice.
Abstract
Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $\text{KL}(q||p)$. In practice, $Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $\text{KL}(q||p)$. In this paper, we analyze the robustness of VI to these misspecifications when $p$ exhibits certain symmetries and $Q$ is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of $p$ when $p$ exhibits an \textit{even} symmetry, and (ii) it recovers the correlation matrix of $p$ when in addition~$p$ exhibits an \textit{elliptical} symmetry. These guarantees hold for the mean even when $q$ is factorized and $p$ is not, and for the correlation matrix even when~$q$ and~$p$ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.
