Table of Contents
Fetching ...

On Naive Mean-Field Approximation for high-dimensional canonical GLMs

Sumit Mukherjee, Jiaze Qiu, Subhabrata Sen

TL;DR

The paper studies the validity of Naive Mean Field (NMF) approximations for high-dimensional canonical GLMs with product priors by leveraging non-linear large deviations to bound the log-partition function ${\log \mathcal{Z}_p}$ and to characterize when NMF is tight. It shows that, under mild design assumptions, any NMF optimizer is a product distribution with components that are quadratic tilts of the prior, and it identifies exponential-tilt sub-families that yield tractable variational approximations for GLMs. A key contribution is a posterior-structure result: if the NMF optimization problem has a well-separated maximizer, the posterior is well-approximated (in Wasserstein distance) by a single product measure, enabling credible intervals with coverage guarantees and out-of-sample predictive characterizations anchored to the dominant optimizer. The paper also provides numerical experiments validating log-partition estimates and demonstrates NMF-based VI for logistic regression with discrete priors, offering practical VI strategies beyond the linear-model setting.

Abstract

We study the validity of the Naive Mean Field (NMF) approximation for canonical GLMs with product priors. This setting is challenging due to the non-conjugacy of the likelihood and the prior. Using the theory of non-linear large deviations (Austin 2019, Chatterjee, Dembo 2016, Eldan 2018), we derive sufficient conditions for the tightness of the NMF approximation to the log-normalizing constant of the posterior distribution. As a second contribution, we establish that under minor conditions on the design, any NMF optimizer is a product distribution where each component is a quadratic tilt of the prior. In turn, this suggests novel iterative algorithms for fitting the NMF optimizer to the target posterior. Finally, we establish that if the NMF optimization problem has a "well-separated maximizer", then this optimizer governs the probabilistic properties of the posterior. Specifically, we derive credible intervals with average coverage guarantees, and characterize the prediction performance on an out-of-sample datapoint in terms of this dominant optimizer.

On Naive Mean-Field Approximation for high-dimensional canonical GLMs

TL;DR

The paper studies the validity of Naive Mean Field (NMF) approximations for high-dimensional canonical GLMs with product priors by leveraging non-linear large deviations to bound the log-partition function and to characterize when NMF is tight. It shows that, under mild design assumptions, any NMF optimizer is a product distribution with components that are quadratic tilts of the prior, and it identifies exponential-tilt sub-families that yield tractable variational approximations for GLMs. A key contribution is a posterior-structure result: if the NMF optimization problem has a well-separated maximizer, the posterior is well-approximated (in Wasserstein distance) by a single product measure, enabling credible intervals with coverage guarantees and out-of-sample predictive characterizations anchored to the dominant optimizer. The paper also provides numerical experiments validating log-partition estimates and demonstrates NMF-based VI for logistic regression with discrete priors, offering practical VI strategies beyond the linear-model setting.

Abstract

We study the validity of the Naive Mean Field (NMF) approximation for canonical GLMs with product priors. This setting is challenging due to the non-conjugacy of the likelihood and the prior. Using the theory of non-linear large deviations (Austin 2019, Chatterjee, Dembo 2016, Eldan 2018), we derive sufficient conditions for the tightness of the NMF approximation to the log-normalizing constant of the posterior distribution. As a second contribution, we establish that under minor conditions on the design, any NMF optimizer is a product distribution where each component is a quadratic tilt of the prior. In turn, this suggests novel iterative algorithms for fitting the NMF optimizer to the target posterior. Finally, we establish that if the NMF optimization problem has a "well-separated maximizer", then this optimizer governs the probabilistic properties of the posterior. Specifically, we derive credible intervals with average coverage guarantees, and characterize the prediction performance on an out-of-sample datapoint in terms of this dominant optimizer.
Paper Structure (12 sections, 11 theorems, 47 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 11 theorems, 47 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions assump:prior, assump:cont and assump:design hold. Finally, assume that for any $\varepsilon>0$, the set has a $p\varepsilon$ net in $\ell_1$ metric of size $N(p,\varepsilon)$, such that Then the NMF approximation is correct to leading order.

Figures (2)

  • Figure 1: The 3-D plot on the left visualizes the almost perfect alignment among $\mathbf{u}_{\text{NMF}}$, $\mathbf{u}_{\text{STAN}}$, and $\mathbf{u}_{\text{JJ}}$, when $n = 4000$ and $p = 100$. The right panel showcases the estimated log partition functions (y-axis) given by different methods, for 20 repeated experiments (x-axis). In particular, 'STAN' stands for a sampling-based method called bridge sampling gronau2017bridgesampling, and 'JJ' refers to the widely celebrated tangent transform algorithm proposed by jaakkola2000bayesian. Please note that for 'JJ', the plotted value was not the evidence lower bound (ELBO). Instead, it was Monte Carlo evaluation of $\mathbb{E}_{\boldsymbol{\sigma} \sim Q^{\text{JJ}}} \left [ - H (\boldsymbol{\sigma}) \right ] -\operatorname{D}_{\text{KL}}(Q \| \pi_0 )$, where $Q^{\text{JJ}}$ is the multivariate Gaussian distribution rendered by the Jaakkola and Jordan algorithm upon convergence. Here $n = 2000$ and $p = 50$.
  • Figure 2: The left panel showcases a (typical) comparison between the approximations of the posterior mean vector given by our iterative scheme ($\mathbf{u}_{\text{NMF}}$ on the x-axis) and a naive Gibbs sampler ($\mathbf{u}_{\text{Gibbs}}$ on the y-axis) when $n = 2000$ and $p = 100$. These two estimators also have comparable mean square errors (MSEs) with respect to the true signal $\bm{\beta}^{\star}$, as outlined in the table on the right (for $n = 1000$ and $p = 50$). The average and standard deviation were computed based on 10 repeated experiments. Entries of the design matrix $\mathbf{X}$ were sampled i.i.d. from $\mathcal{N}(0, 0.01 / n)$. Given the design, we sample $\mathbf{y}$ from a logistic regression model; the coefficients of the regression model are sampled i.i.d. from $\pi$, i.e., these two figure and table were generated assuming a well-specified setting.

Theorems & Definitions (19)

  • Example 1: Linear Regression
  • Example 2: Binary Logistic Regression and Binomial Logistic Regression
  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Definition 4: Exponential tilting
  • Theorem 3: Algorithmic implications
  • ...and 9 more