Amortized Variational Inference: When and Why?
Charles C. Margossian, David M. Blei
TL;DR
This paper analyzes amortized variational inference (A-VI) as a general-purpose alternative to mean-field VI (F-VI). It derives necessary, sufficient, and verifiable conditions under which A-VI can achieve the same optimal solution as F-VI, showing that the ideal inference function exists for simple hierarchical models and can be extended by expanding the input domain to handle more complex structures such as time series. The study demonstrates that some models (e.g., simple hierarchical and saw time-series) allow A-VI to close the amortization gap with relatively compact inference mechanisms, while others (e.g., hidden Markov models) inherently resist closure even with expanded domains. Empirical results across linear, nonlinear, Bayesian neural networks, and time-series illustrate when A-VI matches F-VI and when it offers faster convergence, providing practical guidance on when to use A-VI and how to design the inference function. The findings support the viability of A-VI for full Bayesian inference in a broad class of models and highlight important edge cases and diagnostic tools for model and algorithm selection.
Abstract
In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI's optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI's inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed.
