Table of Contents
Fetching ...

Logistic Variational Bayes Revisited

Michael Komodromos, Marina Evangelou, Sarah Filippi

TL;DR

The paper addresses scalable Bayesian inference for binary outcomes by deriving a new bound for the expectation $\mathbb{E}_{X}[\log(1+\exp(X))]$ with $X \sim \mathcal{N}(\vartheta, \tau^2)$, enabling tractable VI in logistic and GP classification. The bound, denoted $\eta_l(\vartheta, \tau)$, is tighter than the classic Jakkola bound and requires no additional variational parameters, becoming exact as $l \to \infty$. This yields VI-PER, a variational approach that closely matches Monte Carlo posterior quality while significantly reducing computation and improving uncertainty quantification over Polya-Gamma-based VI. Across simulations and real-world data (including soil liquefaction), VI-PER demonstrates competitive predictive performance and superior calibrated uncertainty, with an open-source PyTorch/gpytorch implementation available for broad use.

Abstract

Variational logistic regression is a popular method for approximate Bayesian inference seeing wide-spread use in many areas of machine learning including: Bayesian optimization, reinforcement learning and multi-instance learning to name a few. However, due to the intractability of the Evidence Lower Bound, authors have turned to the use of Monte Carlo, quadrature or bounds to perform inference, methods which are costly or give poor approximations to the true posterior. In this paper we introduce a new bound for the expectation of softplus function and subsequently show how this can be applied to variational logistic regression and Gaussian process classification. Unlike other bounds, our proposal does not rely on extending the variational family, or introducing additional parameters to ensure the bound is tight. In fact, we show that this bound is tighter than the state-of-the-art, and that the resulting variational posterior achieves state-of-the-art performance, whilst being significantly faster to compute than Monte-Carlo methods.

Logistic Variational Bayes Revisited

TL;DR

The paper addresses scalable Bayesian inference for binary outcomes by deriving a new bound for the expectation with , enabling tractable VI in logistic and GP classification. The bound, denoted , is tighter than the classic Jakkola bound and requires no additional variational parameters, becoming exact as . This yields VI-PER, a variational approach that closely matches Monte Carlo posterior quality while significantly reducing computation and improving uncertainty quantification over Polya-Gamma-based VI. Across simulations and real-world data (including soil liquefaction), VI-PER demonstrates competitive predictive performance and superior calibrated uncertainty, with an open-source PyTorch/gpytorch implementation available for broad use.

Abstract

Variational logistic regression is a popular method for approximate Bayesian inference seeing wide-spread use in many areas of machine learning including: Bayesian optimization, reinforcement learning and multi-instance learning to name a few. However, due to the intractability of the Evidence Lower Bound, authors have turned to the use of Monte Carlo, quadrature or bounds to perform inference, methods which are costly or give poor approximations to the true posterior. In this paper we introduce a new bound for the expectation of softplus function and subsequently show how this can be applied to variational logistic regression and Gaussian process classification. Unlike other bounds, our proposal does not rely on extending the variational family, or introducing additional parameters to ensure the bound is tight. In fact, we show that this bound is tighter than the state-of-the-art, and that the resulting variational posterior achieves state-of-the-art performance, whilst being significantly faster to compute than Monte-Carlo methods.
Paper Structure (26 sections, 3 theorems, 23 equations, 6 figures, 8 tables)

This paper contains 26 sections, 3 theorems, 23 equations, 6 figures, 8 tables.

Key Result

Theorem 2.1

Let $X \sim N(\vartheta, \tau^2)$ then for any $l \geq 1$, $\mathbb{E}_{X} [ \log(1 + \exp(X)) ] \leq \eta_l(\vartheta, \tau)$ where and $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution.

Figures (6)

  • Figure 1: Error of bounds. Comparison of Jakkola97 (Jakkola97) bound (), proposed bound () with $l=10$, and Monte Carlo estimate () for (a) $\tau=2.0$ and $\vartheta \in [-3, 3]$, (b) $\vartheta = 1.0$ and $\tau \in [0.1, 3.0]$. (c) The number terms ($l$) needed such that the relative error is below 1%.
  • Figure 2: GP classification: illustrative example. Presented is the mean (solid line) and 95% credible interval (shaded region) of the posterior distribution for the different methods. The true function is shown in dashed line (), the training data are given by the black points () and the test data by the magenta crosses ( +). In the top right corner the KL divergence between the variational posterior computed using Monte Carlo and the variational posterior computed using the respective method is presented.
  • Figure 3: Application to Soil liquefaction. Standard deviation of soil liquefaction probability evaluated for the Loma Prieta earthquake for VI--PER , VI--MC and VI--PG under the variational family $\mathcal{Q}$.
  • Figure 4: Comparison of the relative error of the (a) Jakkola97 bound, (b) the proposed bound, (c) difference between the relative error of the bounds. The comparison is over a grid of values of $\vartheta$ and $\tau$. Here the relative error of the bounds is the absolute difference between the bound and the ground truth, divided by the ground truth itself, where the ground truth is the expectation of $\log(1+\exp(X))$ computed using Monte Carlo with $5 \times 10^6$ samples.
  • Figure 5: The number of terms ($l$) needed such that the relative error is below (a) 0.5%, (b) 1%, (c) 2.5% and (d) 5% for different values of $\tau$ and $\vartheta$.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 2.1
  • Lemma 2.2
  • Corollary 2.3