Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation
Bohan Wu, David Blei
TL;DR
Xi-variational inference ($\Xi$-VI) extends naive mean-field VI by adding an expressivity penalty that discourages excessive factorization, with a tunable regularization parameter $\lambda$ that smoothly trades off statistical fidelity and computational efficiency. The inner coupling between variables is solved via entropic optimal transport, implemented through a multi-marginal Sinkhorn algorithm, yielding a posterior that interpolates between MFVI and the exact Bayes posterior. The authors establish frequentist guarantees, Bernstein–von Mises-type results, and high-dimensional asymptotics, providing regimes where $\Xi$-VI behaves like MFVI, Bayes-optimal inference, or an intermediate tempered posterior. They demonstrate practical gains on multivariate Gaussian, Bayesian linear regression with Laplace priors, and hierarchical eight-schools models, and discuss computational complexity, stability, and scalable strategies. Overall, $\Xi$-VI offers a principled, theory-grounded framework that bridges variational accuracy with tractable computation via entropic OT.
Abstract
Variational inference (VI) has emerged as a popular method for approximate inference for high-dimensional Bayesian models. In this paper, we propose a novel VI method that extends the naive mean field via entropic regularization, referred to as $Ξ$-variational inference ($Ξ$-VI). $Ξ$-VI has a close connection to the entropic optimal transport problem and benefits from the computationally efficient Sinkhorn algorithm. We show that $Ξ$-variational posteriors effectively recover the true posterior dependency, where the dependence is downweighted by the regularization parameter. We analyze the role of dimensionality of the parameter space on the accuracy of $Ξ$-variational approximation and how it affects computational considerations, providing a rough characterization of the statistical-computational trade-off in $Ξ$-VI. We also investigate the frequentist properties of $Ξ$-VI and establish results on consistency, asymptotic normality, high-dimensional asymptotics, and algorithmic stability. We provide sufficient criteria for achieving polynomial-time approximate inference using the method. Finally, we demonstrate the practical advantage of $Ξ$-VI over mean-field variational inference on simulated and real data.
