On Naive Mean-Field Approximation for high-dimensional canonical GLMs
Sumit Mukherjee, Jiaze Qiu, Subhabrata Sen
TL;DR
The paper studies the validity of Naive Mean Field (NMF) approximations for high-dimensional canonical GLMs with product priors by leveraging non-linear large deviations to bound the log-partition function ${\log \mathcal{Z}_p}$ and to characterize when NMF is tight. It shows that, under mild design assumptions, any NMF optimizer is a product distribution with components that are quadratic tilts of the prior, and it identifies exponential-tilt sub-families that yield tractable variational approximations for GLMs. A key contribution is a posterior-structure result: if the NMF optimization problem has a well-separated maximizer, the posterior is well-approximated (in Wasserstein distance) by a single product measure, enabling credible intervals with coverage guarantees and out-of-sample predictive characterizations anchored to the dominant optimizer. The paper also provides numerical experiments validating log-partition estimates and demonstrates NMF-based VI for logistic regression with discrete priors, offering practical VI strategies beyond the linear-model setting.
Abstract
We study the validity of the Naive Mean Field (NMF) approximation for canonical GLMs with product priors. This setting is challenging due to the non-conjugacy of the likelihood and the prior. Using the theory of non-linear large deviations (Austin 2019, Chatterjee, Dembo 2016, Eldan 2018), we derive sufficient conditions for the tightness of the NMF approximation to the log-normalizing constant of the posterior distribution. As a second contribution, we establish that under minor conditions on the design, any NMF optimizer is a product distribution where each component is a quadratic tilt of the prior. In turn, this suggests novel iterative algorithms for fitting the NMF optimizer to the target posterior. Finally, we establish that if the NMF optimization problem has a "well-separated maximizer", then this optimizer governs the probabilistic properties of the posterior. Specifically, we derive credible intervals with average coverage guarantees, and characterize the prediction performance on an out-of-sample datapoint in terms of this dominant optimizer.
