Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

Lukas Tatzel; Jonathan Wenger; Frank Schneider; Philipp Hennig

Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

Lukas Tatzel, Jonathan Wenger, Frank Schneider, Philipp Hennig

TL;DR

This paper tackles the scalability bottleneck of non-conjugate Gaussian processes by explicitly modeling the uncertainty introduced by approximate inference. It introduces IterNCGP, a computation-aware framework that treats Newton steps in Laplace-based inference as a sequence of GP regressions solved by an inner probabilistic linear solver (IterGP), producing a tunable trade-off between computation and uncertainty. Key contributions include policy-driven targeted computations (SoD vs CG), a recycling mechanism to reuse prior computations across Newton steps, and a memory-efficient compression strategy to bound resource use. Experiments on Poisson regression and large-scale GP multiclass classification demonstrate substantial speedups over strong baselines, with competitive predictive performance and calibrated uncertainty.

Abstract

Non-conjugate Gaussian processes (NCGPs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in NCGPs is prohibitively expensive for large datasets, thus requiring approximations in practice. The approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. We introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for NCGPs. As we demonstrate on large-scale classification problems, our method significantly accelerates posterior inference compared to competitive baselines by trading off reduced computation for increased uncertainty.

Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

TL;DR

Abstract

Paper Structure (34 sections, 7 theorems, 42 equations, 10 figures)

This paper contains 34 sections, 7 theorems, 42 equations, 10 figures.

Introduction
Background
Non-conjugate Gaussian Processes (NCGPs)
Approximate Inference via Laplace
Predictions
Computation-Aware Inference in NCGPs
Derivation of the IterNCGP Framework
Policy Choice: Targeted Computations
Recycling: Reusing Computations
Compression: Memory-Efficient Beliefs
Cost Analysis of IterNCGP
Related Work
Experiments
Poisson Regression
Large-Scale GP Multi-Class Classification
...and 19 more sections

Key Result

Proposition A.1

Let ${\bm{W}}({\bm{f}}_i)$ be invertible. Using the transform ${\bm{g}} \coloneqq {\bm{f}} - {\bm{m}}$ and consequently ${\bm{g}}_i = {\bm{f}}_i - {\bm{m}}$, the Newton step (eq:newton_step) can be written as

Figures (10)

Figure 1: Binary Classification with IterNCGP. Comparison of two IterNCGP variants: (Top)IterNCGP variant corresponding to data subsampling and solving each regression problem exactly in each Newton step $i$. (Bottom)IterNCGP variant with a more informative policy (details in \ref{['sec:policy']}), recycling of computations between Newton steps (details in \ref{['sec:recycling']}) and compression to reduce memory (details in \ref{['sec:compression']}). The panels show the marginal uncertainty ( ) over the latent function at Newton step $i$ and solver iteration $j$. Using recycling, the current belief is efficiently propagated between mode-finding steps $i$ (❷ $\!\to\!$ ❸) without performance drops (Right). Details in \ref{['sec:details_binary_classification']}.
Figure 2: Approximate Inference in NCGPs as Sequential GP Regression. Performing a LA at a Newton iterate ${\bm{f}}_i$ results in a posterior GP that coincides with the posterior to a GP regression problem with pseudo targets $\hat{{\bm{y}}}({\bm{f}}_i)$ observed with Gaussian noise ${\operatorname{\mathcal{N}}\mathopen{}\left({\bm{0}}, {\bm{W}}({\bm{f}}_i)^{-1}\right)}$. The plot shows an illustration of this connection for binary classification on a toy problem with the latent function drawn from a GP. Notice how similar the posteriors are between Newton steps. This motivates our proposed strategy for recycling computations between steps in \ref{['sec:recycling']}. Details in \ref{['sec:details_binary_classification']}.
Figure 3: Different IterNCGP Policies Applied to GP Classification.(Left) The true posterior mean $m_{0,*}$ ( ) for binary classification (/) and its decision boundary (). (Right) Current posterior mean estimate after $1, 10,$ and $19$ iterations with the unit vector policy (Top) and the CG policy (Bottom). Shown are the data points selected by the policy in this iteration with the dot size indicating their relative weight. For IterNCGP-Chol, data points are targeted one by one and previously used data points are marked with (). Details in \ref{['sec:details_binary_classification']}
Figure 4: Compressed Beliefs. Recycled initial beliefs in the second Newton step ($i = 1$) with means $m_{1, 0}$(Top) and (co-)variance functions $K_{1, 0}$(Bottom) using compression with different buffer sizes $R \in \{0, 1, 3, 10\}$. Buffer size $R=0$ is equivalent to not using recycling. The larger the buffer size/rank of ${\bm{C}}_0$, the more expressive the belief. Details in \ref{['sec:details_binary_classification']}.
Figure 5: Poisson Regression with IterNCGP.(Left) Test loss performance for IterNCGP-CG with recycling and four schedules ($j \leq 1, 5, 10$ or $20$) over $100, 20, 10$ or $5$ steps (always using the same total budget of $100$ iterations). For each schedule, the median (solid line) and min/max (shaded area) over $10$ runs are reported. The crosses indicate the end of each run. (Right) Posterior ${\operatorname{\mathcal{GP}}\left(m_{i,j}, k_{i,j}\right)}$ for the latent log rate $f$(Top) and the corresponding belief about the rate $\lambda$(Bottom) computed via MC at three timepoints during a run of IterNCGP. The shaded 95% credible intervals show how stopping early trades less computation for increased uncertainty. Details in \ref{['sec:details_poisson_regression']}.
...and 5 more figures

Theorems & Definitions (10)

Proposition A.1: Reformulation of the Newton Step
proof
Theorem A.2: Generalization of IterGP
proof
Proposition A.3: The Uncertainty Decreases in the Inner Loop
Proposition A.4: Residual in $\mathop{\mathrm{span}}\nolimits\{{\bm{S}}\}$ Is Zero
Proposition A.5: Error in Representer Weights in $\mathop{\mathrm{span}}\nolimits\{{\bm{S}}\}$ Is Zero
Proposition A.6: Total Marginal Uncertainty
Lemma A.7: Explicit Pseudo-Inverse for Multi-Class Classification
proof

Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

TL;DR

Abstract

Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (10)