Table of Contents
Fetching ...

Likelihood approximations via Gaussian approximate inference

Thang D. Bui

TL;DR

Efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities by Gaussian densities based on variational inference and moment matching in transformed bases enable efficient inference strategies originally designed for models with a Gaussian likelihood to be deployed.

Abstract

Non-Gaussian likelihoods are essential for modelling complex real-world observations but pose significant computational challenges in learning and inference. Even with Gaussian priors, non-Gaussian likelihoods often lead to analytically intractable posteriors, necessitating approximation methods. To this end, we propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities based on variational inference and moment matching in transformed bases. These enable efficient inference strategies originally designed for models with a Gaussian likelihood to be deployed. Our empirical results demonstrate that the proposed matching strategies attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings. In challenging streaming problems, the proposed methods outperform all existing likelihood approximations and approximate inference methods in the exact models. As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.

Likelihood approximations via Gaussian approximate inference

TL;DR

Efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities by Gaussian densities based on variational inference and moment matching in transformed bases enable efficient inference strategies originally designed for models with a Gaussian likelihood to be deployed.

Abstract

Non-Gaussian likelihoods are essential for modelling complex real-world observations but pose significant computational challenges in learning and inference. Even with Gaussian priors, non-Gaussian likelihoods often lead to analytically intractable posteriors, necessitating approximation methods. To this end, we propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities based on variational inference and moment matching in transformed bases. These enable efficient inference strategies originally designed for models with a Gaussian likelihood to be deployed. Our empirical results demonstrate that the proposed matching strategies attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings. In challenging streaming problems, the proposed methods outperform all existing likelihood approximations and approximate inference methods in the exact models. As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.

Paper Structure

This paper contains 53 sections, 9 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Approximations of Gamma densities [first two plots, \ref{['sec:gamma_multiclass']}] and Beta densities [last two plots, \ref{['sec:beta_binary']}]. The methods included are log-Normal moment matching milios2018dirichlet, Laplace approximation hobbhahn2022laplacematching, and Gaussian moment matching (moment) and variational matching (variational) in transformed bases. The first and third columns show the log-pdfs and their approximations. The second and fourth plots show the corresponding log/logit-transformed densities and their Gaussian approximations. Best viewed in colour.
  • Figure 2: Neural networks classifiers on a toy four-class dataset trained by softmax cross-entropy [exact], one-hot label regression [gauss], and mean-squared errors using log-normal moment matching in the standard basis [moment-ori], and Gaussian moment matching [moment], Laplace matching [laplace] and variational matching [variational] in the log basis. Each network has two hidden layers, each with 64 rectified linear units.
  • Figure 3: Approximate log marginal likelihood landscapes for different Gaussian approximations of the softmax likelihood on the ionosphere dataset. The x-axis represents the log-lengthscale, and the y-axis represents the log-variance of the exponentiated quadratic kernel. Brighter colours indicate higher likelihood values.
  • Figure 4: Performance comparison of different approximation methods in online learning scenarios for FashionMNIST (left) and CIFAR-10 (right) datasets. The top row shows test accuracy, and the bottom row shows test log-likelihood, both as functions of the number of data points seen. For FashionMNIST, we use random neural network features, while for CIFAR-10, we use pretrained VGG features.
  • Figure 5: Active learning of a Bayesian logistic regression model on the IMDB dataset.
  • ...and 13 more figures