Logistic Variational Bayes Revisited

Michael Komodromos; Marina Evangelou; Sarah Filippi

Logistic Variational Bayes Revisited

Michael Komodromos, Marina Evangelou, Sarah Filippi

TL;DR

The paper addresses scalable Bayesian inference for binary outcomes by deriving a new bound for the expectation $\mathbb{E}_{X}[\log(1+\exp(X))]$ with $X \sim \mathcal{N}(\vartheta, \tau^2)$, enabling tractable VI in logistic and GP classification. The bound, denoted $\eta_l(\vartheta, \tau)$, is tighter than the classic Jakkola bound and requires no additional variational parameters, becoming exact as $l \to \infty$. This yields VI-PER, a variational approach that closely matches Monte Carlo posterior quality while significantly reducing computation and improving uncertainty quantification over Polya-Gamma-based VI. Across simulations and real-world data (including soil liquefaction), VI-PER demonstrates competitive predictive performance and superior calibrated uncertainty, with an open-source PyTorch/gpytorch implementation available for broad use.

Abstract

Variational logistic regression is a popular method for approximate Bayesian inference seeing wide-spread use in many areas of machine learning including: Bayesian optimization, reinforcement learning and multi-instance learning to name a few. However, due to the intractability of the Evidence Lower Bound, authors have turned to the use of Monte Carlo, quadrature or bounds to perform inference, methods which are costly or give poor approximations to the true posterior. In this paper we introduce a new bound for the expectation of softplus function and subsequently show how this can be applied to variational logistic regression and Gaussian process classification. Unlike other bounds, our proposal does not rely on extending the variational family, or introducing additional parameters to ensure the bound is tight. In fact, we show that this bound is tighter than the state-of-the-art, and that the resulting variational posterior achieves state-of-the-art performance, whilst being significantly faster to compute than Monte-Carlo methods.

Logistic Variational Bayes Revisited

TL;DR

The paper addresses scalable Bayesian inference for binary outcomes by deriving a new bound for the expectation

with

, enabling tractable VI in logistic and GP classification. The bound, denoted

, is tighter than the classic Jakkola bound and requires no additional variational parameters, becoming exact as

. This yields VI-PER, a variational approach that closely matches Monte Carlo posterior quality while significantly reducing computation and improving uncertainty quantification over Polya-Gamma-based VI. Across simulations and real-world data (including soil liquefaction), VI-PER demonstrates competitive predictive performance and superior calibrated uncertainty, with an open-source PyTorch/gpytorch implementation available for broad use.

Abstract

Paper Structure (26 sections, 3 theorems, 23 equations, 6 figures, 8 tables)

This paper contains 26 sections, 3 theorems, 23 equations, 6 figures, 8 tables.

Introduction
Proposal
A New Bound
Applications to Classification
Variational Logistic Regression
Gaussian Process Classification
Computational Complexity
Implementation Details
Numerical Experiments
Logistic Regression Simulation Study
Gaussian Process Classification: Illustrative Example
Application to Real World Data
Application to Soil Liquefaction Data
Application to Publicly Available Datasets
Discussion
...and 11 more sections

Key Result

Theorem 2.1

Let $X \sim N(\vartheta, \tau^2)$ then for any $l \geq 1$, $\mathbb{E}_{X} [ \log(1 + \exp(X)) ] \leq \eta_l(\vartheta, \tau)$ where and $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution.

Figures (6)

Figure 1: Error of bounds. Comparison of Jakkola97 (Jakkola97) bound (), proposed bound () with $l=10$, and Monte Carlo estimate () for (a) $\tau=2.0$ and $\vartheta \in [-3, 3]$, (b) $\vartheta = 1.0$ and $\tau \in [0.1, 3.0]$. (c) The number terms ($l$) needed such that the relative error is below 1%.
Figure 2: GP classification: illustrative example. Presented is the mean (solid line) and 95% credible interval (shaded region) of the posterior distribution for the different methods. The true function is shown in dashed line (), the training data are given by the black points () and the test data by the magenta crosses ( +). In the top right corner the KL divergence between the variational posterior computed using Monte Carlo and the variational posterior computed using the respective method is presented.
Figure 3: Application to Soil liquefaction. Standard deviation of soil liquefaction probability evaluated for the Loma Prieta earthquake for VI--PER , VI--MC and VI--PG under the variational family $\mathcal{Q}$.
Figure 4: Comparison of the relative error of the (a) Jakkola97 bound, (b) the proposed bound, (c) difference between the relative error of the bounds. The comparison is over a grid of values of $\vartheta$ and $\tau$. Here the relative error of the bounds is the absolute difference between the bound and the ground truth, divided by the ground truth itself, where the ground truth is the expectation of $\log(1+\exp(X))$ computed using Monte Carlo with $5 \times 10^6$ samples.
Figure 5: The number of terms ($l$) needed such that the relative error is below (a) 0.5%, (b) 1%, (c) 2.5% and (d) 5% for different values of $\tau$ and $\vartheta$.
...and 1 more figures

Theorems & Definitions (3)

Theorem 2.1
Lemma 2.2
Corollary 2.3

Logistic Variational Bayes Revisited

TL;DR

Abstract

Logistic Variational Bayes Revisited

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)