Predictive variational inference: Learn the predictively optimal posterior distribution
Jinlin Lai, Yuling Yao
TL;DR
Predictive Variational Inference (PVI) reframes posterior inference as optimizing the posterior predictive distribution to be close to the data-generating process under a chosen scoring rule, rather than approximating the exact Bayesian posterior. By using flexible variational families (e.g., normalizing flows) and regularization that can interpolate toward Bayesian posteriors, PVI yields predictive-optimal posteriors that may differ from traditional Bayes, especially under model misspecification, and can reveal population-level heterogeneity through non-vanishing posterior uncertainty. The framework supports both likelihood-exact and likelihood-free settings and provides gradient estimators for multiple scoring rules (log, quadratic, CRPS), enabling practical SGD-based optimization. Empirically, PVI improves held-out predictive performance on real data tasks (election analysis) and likelihood-free cryoEM, while also acting as a diagnostic for model expansion by exposing parameter heterogeneity. Overall, PVI offers a robust, diagnostic, and scalable approach to predictive inference that directly targets predictive accuracy and uncertainty calibration in the presence of misspecification.
Abstract
Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.
