Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs
Charles C. Margossian, Loucas Pillaud-Vivien, Lawrence K. Saul
TL;DR
This work analyzes variational inference under factorized approximations when the target p has non-diagonal dependencies. It proves an impossibility theorem: a diagonal FG-VI solution can match at most one of marginal variances, marginal precisions, or generalized variance, underscoring trade-offs in uncertainty quantification. By comparing multiple divergences—KL(q||p), KL(p||q), α-divergences, and score-based divergences—the paper establishes an ordering among the resulting uncertainty estimates for Gaussian p and q, guiding divergence choice relative to the desired uncertainty metric. It also shows that some divergences yield variational collapse or entropy-matching phenomena, and it validates the theory with extensive simulations on Gaussian and non-Gaussian targets, including real models from the Inference Gym. The findings highlight the need to align divergence selection with downstream inference goals and suggest directions for richer variational families to mitigate the inherent trade-offs.
Abstract
Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though $p$ itself does not factorize. We show that this mismatch can lead to an impossibility theorem: if $p$ does not factorize and furthermore has a non-diagonal covariance matrix, then any factorized approximation $q\!\in\!Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which for elliptical distributions is closely related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general $α$-divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We thoroughly analyze the case where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. In this setting, we show that all the considered divergences can be ordered based on the estimates of uncertainty they yield as objective functions for VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.
