Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Charles C. Margossian; Loucas Pillaud-Vivien; Lawrence K. Saul

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Charles C. Margossian, Loucas Pillaud-Vivien, Lawrence K. Saul

TL;DR

This work analyzes variational inference under factorized approximations when the target p has non-diagonal dependencies. It proves an impossibility theorem: a diagonal FG-VI solution can match at most one of marginal variances, marginal precisions, or generalized variance, underscoring trade-offs in uncertainty quantification. By comparing multiple divergences—KL(q||p), KL(p||q), α-divergences, and score-based divergences—the paper establishes an ordering among the resulting uncertainty estimates for Gaussian p and q, guiding divergence choice relative to the desired uncertainty metric. It also shows that some divergences yield variational collapse or entropy-matching phenomena, and it validates the theory with extensive simulations on Gaussian and non-Gaussian targets, including real models from the Inference Gym. The findings highlight the need to align divergence selection with downstream inference goals and suggest directions for richer variational families to mitigate the inherent trade-offs.

Abstract

Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though $p$ itself does not factorize. We show that this mismatch can lead to an impossibility theorem: if $p$ does not factorize and furthermore has a non-diagonal covariance matrix, then any factorized approximation $q\!\in\!Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which for elliptical distributions is closely related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general $α$-divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We thoroughly analyze the case where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. In this setting, we show that all the considered divergences can be ordered based on the estimates of uncertainty they yield as objective functions for VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

TL;DR

Abstract

Given an intractable distribution

, the problem of variational inference (VI) is to find the best approximation from some more tractable family

. Commonly, one chooses

to be a family of factorized distributions (i.e., the mean-field assumption), even though

itself does not factorize. We show that this mismatch can lead to an impossibility theorem: if

does not factorize and furthermore has a non-diagonal covariance matrix, then any factorized approximation

can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which for elliptical distributions is closely related to the entropy). In practice, the best variational approximation in

is found by minimizing some divergence

between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general

-divergences, and a score-based divergence which compares

and

. We thoroughly analyze the case where

is a Gaussian and

is a (factorized) Gaussian. In this setting, we show that all the considered divergences can be ordered based on the estimates of uncertainty they yield as objective functions for VI. Finally, we empirically evaluate the validity of this ordering when the target distribution

is not Gaussian.

Paper Structure (22 sections, 10 theorems, 72 equations, 10 figures, 5 tables)

This paper contains 22 sections, 10 theorems, 72 equations, 10 figures, 5 tables.

Introduction
Summary of Contributions
Main Results
Related Work
Proof of Impossibility Theorem
Divergences for FG-VI
KL Divergences
$\alpha$-divergence
Score-based Divergence
Ordering of Divergences for FG-VI
Ordering of Score-based Divergences
Ordering of KL and $\alpha$-divergences
Ordering of $\alpha$-divergences for $\alpha \in (0, 1)$
Entropy-matching Solution for FG-VI
Non-ordering of the Score-based and $\alpha$-divergences for $\alpha\! >\! 1$
...and 7 more sections

Key Result

Theorem 1

Let $p$ and $q$ be distributions with covariances $\boldsymbol \Sigma$ and $\boldsymbol \Psi$, respectively, where $\boldsymbol \Psi$ is diagonal but $\boldsymbol \Sigma$ is not. Then

Figures (10)

Figure 1: When FG-VI is based on minimizing the score-based divergences in Table \ref{['tab:divergences']}, it may estimate zero or infinite values for the marginal variances. The red areas indicate these occurrences of variational collapse when FG-VI with a score-based divergence is used to approximate a three-dimensional Gaussian with a non-diagonal correlation matrix $\mathbf{C}$.
Figure 2: Variances, precisions, and entropy estimated by FG-VI with different divergences. In the left and center panels, the variance is normalized by the variance of $p$, and the precision by the precision of $p$, both along the first coordinate. In the right panel, we plot the difference between the estimated entropy and the entropy of $p$. Here FG-VI was used to approximate a 2-dimensional Gaussian with correlation $\varepsilon$. The $\alpha$-divergence was computed for $\alpha\!=\!0.5$. The ordering of the curves matches the predictions of \ref{['thm:ordering']}.
Figure 3: (Left) Either $\Psi_{jj}(\alpha)$ is strictly increasing over $\alpha\!\in\!(0,1)$, or it is not, with some minimal point $\tau$ of vanishing derivative. We prove the former by showing that no such point $\tau$ exists. (Right) The proof is based on properties of the function $f(\alpha)$ in \ref{['eq:f-convex']}. The function is convex; it also satisfies $f(0)\!=\!f(\tau)\! =\! \Psi_{jj}(\tau)$ and $f'(0)f'(\tau)<0$.
Figure 4: Marginal variances of $q$ in eq. (\ref{['eq:q_normal']}) that minimize $\text{D}_\alpha(q||p)$ as a function of $\alpha$. The target $p$ is a three dimensional Gaussian. The plot shows the variances of $q$ normalized by the variances of $p$. The variances of $q$ are a strictly increasing function of $\alpha$, indicating that the $\alpha$-divergences are ordered. While we prove this to be the case for any Gaussian target when $\alpha\! \in\! (0, 1)$, it remains an open problem to prove this when $\alpha\! >\! 1$.
Figure 5: When $p$ is Gaussian over $\mathbb{R}^n$, there always exists a unique $\alpha \in (0, 1)$ such that the factorized approximation $q$ minimizing $\text{D}_\alpha(p||q)$ matches the entropy of $p$. The plots shows, however, that the entropy-matching value of $\alpha$ depends on the dimension and covariance structure of $p$. Above we vary the dimension $n$ and constant correlation $\varepsilon$.
...and 5 more figures

Theorems & Definitions (13)

Theorem 1: Impossibility theorem for F-VI
Definition 2: Ordering of divergences
Theorem 3: Ordering theorem for FG-VI
Theorem 4: Restatement of impossibility theorem
Lemma 5
Proposition 6: Mean matching
Proposition 7: Variance bounds
Proposition 8: Fixed-point equations
Proposition 9: NQP for minimizing $S(q||p)$
Proposition 10: NQP for minimizing $S(q||p)$
...and 3 more

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

TL;DR

Abstract

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (13)