Robust and Conjugate Gaussian Process Regression

Matias Altamirano; François-Xavier Briol; Jeremias Knoblauch

Robust and Conjugate Gaussian Process Regression

Matias Altamirano, François-Xavier Briol, Jeremias Knoblauch

TL;DR

Robust and Conjugate Gaussian Process Regression (RCGP) addresses the fragility of standard GP regression under outliers by combining generalized Bayesian inference with a robust, quadratic loss L^w_n. This yields a Gaussian, closed-form posterior p^w(f|y,x) with updated terms μ^R and Σ^R, where the effective noise is σ^2 J_w and there is a shrinkage term m_w, enabling exact posterior updates while down-weighting anomalous observations. The weighting function w, particularly an IMQ-based choice, ensures robustness with bounded posterior influence, and hyperparameters are selected via leave-one-out cross-validation with O(n^3) cost. RC GP is shown to integrate with SVGPs, deep GPs, and Bayesian optimization, delivering strong performance on benchmarking and real-world tasks like the Twitter flash crash, while maintaining competitive speed relative to standard GPs. Overall, RCGP offers a practical, theoretically grounded route to robust, conjugate GP regression across a range of applications.

Abstract

To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.

Robust and Conjugate Gaussian Process Regression

TL;DR

Abstract

Paper Structure (50 sections, 5 theorems, 17 equations, 11 figures, 5 tables)

This paper contains 50 sections, 5 theorems, 17 equations, 11 figures, 5 tables.

Introduction
Existing Work
Contributions
Background
Gaussian Processes
Generalised Bayesian Inference
Methodology
The Loss Function
Robust and Conjugate Gaussian Processes
Hyperparameter Selection
Robustness
Choice of Weighting Function
Interpretation of RCGP Terms
Experiments
Benchmarking
...and 35 more sections

Key Result

Proposition 3.1

Suppose $f \sim \mathcal{GP}(m,k)$ and ${\bm{\varepsilon}} \sim \mathcal{N}(0, I_n \sigma^2)$. Then, the RCGP posterior is for ${\mathbf w} = (w(x_1,y_1),\ldots,w(x_n,y_n))^\top$, ${\mathbf m}_{\mathbf w} = {\mathbf m} +\sigma^{2}\nabla_{y}\log({\mathbf w}^{2})$ and $J_{{\mathbf w}} =\mathop{\mathrm{diag}}\nolimits(\frac{\sigma^{2}}{2}{\mathbf w}^{-2})$. The RCGP's posterior predictive over $f_{\

Figures (11)

Figure 1: The posterior predictive mean of a GP (green) and the RCGP (blue) on a synthetic dataset where 10% of the data are uniformly generated outliers. Unlike the RCGP, the GP is adversely affected.
Figure 2: $\text{PIF}_{\text{GP}}$ (green) and $\text{PIF}_\text{RCGP}$ (blue) for the dataset in \ref{['fig:synthetic']}. $\text{PIF}_{\text{GP}} \rightarrow \infty$ as $|y_{m}-y^{c}_{m}|\to \infty$, so that standard GPs are not robust. In contrast, $\text{PIF}_{\text{RCGP}}$ is bounded, showing robustness of the RCGP.
Figure 3: Comparing kernel-based $w$ with the same hyperparameters: IMQ (blue) and Squared Exponential (SE) (orange). The dashed vertical lines indicate the soft threshold $c$ past which a point is increasingly treated as an outlier. The SE down-weights observations more rapidly as they exceed $c$ than the IMQ. The maximum possible weight for any observation is $\beta = 1$.
Figure 4: Posterior predictive mean for varying values of $c$ obtained by adjusting $\varepsilon$ using the quantile absolute deviation method proposed in \ref{['sec:choice_w']}, applied to a synthetic dataset with 10% uniformly generated outliers. Left: Full RCGP. Center: RCGP with no shrinkage term. Right: RCGP with no noise term.
Figure 5: Considered contamination regimes are asymmetric (left) and focused (right).
...and 6 more figures

Theorems & Definitions (13)

Proposition 3.1
Proposition 3.2
Proposition 4.1
Proposition 4.2
proof
proof
Proposition 1.1
proof
proof
proof
...and 3 more

Robust and Conjugate Gaussian Process Regression

TL;DR

Abstract

Robust and Conjugate Gaussian Process Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (13)