Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Shifan Zhao; Jiaying Lu; Ji Yang; Edmond Chow; Yuanzhe Xi

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Shifan Zhao, Jiaying Lu, Ji Yang, Edmond Chow, Yuanzhe Xi

TL;DR

This work tackles misspecification in Gaussian Process Regression by separating mean prediction from uncertainty quantification in a two-stage GPR, aided by Automatic Kernel Search (AKS) and a subsampling warm-start to efficiently initialize hyperparameters. The AKS framework provides theoretical bounds and a practical algorithm to mitigate kernel misspecification, while the two-stage design guards against mean misspecification that can bias hyperparameters and UQ. The authors present two GP variants—scalable two-stage GPR and two-stage Exact-GP—validated across small UCI benchmarks, large-scale datasets, and safety-critical GP-enhanced foundation models, showing improved uncertainty quantification and competitive predictive performance. Collectively, the framework offers robust, cost-effective means to achieve reliable UQ under resource constraints, with clear guidance on when to deploy each variant in practice.

Abstract

Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical applications. However, a systematic approach to handle these misspecifications is lacking in the literature. In this work, we propose a general framework to address these issues. Firstly, we introduce a flexible two-stage GPR framework that separates mean prediction and uncertainty quantification (UQ) to prevent mean misspecification, which can introduce bias into the model. Secondly, kernel function misspecification is addressed through a novel automatic kernel search algorithm, supported by theoretical analysis, that selects the optimal kernel from a candidate set. Additionally, we propose a subsampling-based warm-start strategy for hyperparameter initialization to improve efficiency and avoid hyperparameter misspecification. With much lower computational cost, our subsampling-based strategy can yield competitive or better performance than training exclusively on the full dataset. Combining all these components, we recommend two GPR methods-exact and scalable-designed to match available computational resources and specific UQ requirements. Extensive evaluation on real-world datasets, including UCI benchmarks and a safety-critical medical case study, demonstrates the robustness and precision of our methods.

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

TL;DR

Abstract

Paper Structure (23 sections, 18 theorems, 79 equations, 7 figures, 9 tables, 7 algorithms)

This paper contains 23 sections, 18 theorems, 79 equations, 7 figures, 9 tables, 7 algorithms.

Introduction
Mitigating Mean Misspecification via Two-stage GPR
Mitigating Kernel Misspecifications via Automatic Kernel Search
Hyperparameters Tuning via Subampling Warm Start
Two Approaches for GP
Numerical results
Performance Comparison for Exact-GP on UCI Dataset
Uncertainty Quantification for Scalable GP on UCI datasets
Uncertainty Quantification for GP-Enhanced Pre-Trained Foundation Models
Limitations
Conclusion
Background Materials
Popular kernels
Gaussian Process Classification (GPC)
Supplementary Materials for Section \ref{['sec: mean specification']}
...and 8 more sections

Key Result

Theorem 1

If the mean function $m(x)$ is not a zero function, minimizing the $MEL$ will not recover the ground-truth hyperparameters $\theta^*$ if $\theta^*$ is not a stationary point of $m(X)^{\top}K_{n}^{-1}\frac{\partial K_n}{\partial \theta} K_{n}^{-1}m(X)$.

Figures (7)

Figure 1: Two-stage Gaussian Process Regression (GPR) (Section \ref{['sec: mean specification']}) via Automatic Kernel Search (Section \ref{['sec: kernel misspecification']}) and Subsampling (Section \ref{['sec: subsampling GP']}). Stage 1: Automatic Kernel Search selects the best kernel for the mean prediction, followed by mean prediction using a Kernel Ridge Regression (KRR). Stage 2: After demeaning the training data using the mean prediction from the first stage, automatic Kernel Search identifies the best kernel for uncertainty quantification, and a zero-mean GPR with the corresponding kernel is trained for via subsampling warm start. The final predictive distribution combines these mean and covariance predictions to enhance the model's accuracy and robustness.
Figure 2: The left panel displays the observed targets $y_i$ and ground-truth function values $f(x_i)$ for 30 points $x_i$ uniformly distributed in the interval $[-5, 5]$. The targets are generated by $y_i = f(x_i) + \epsilon_i$, where $f(x) = 3x + 2\sin(2\pi x)$ and $\epsilon_i \sim N(0, 1)$. The dataset is randomly split into 80% training and 20% testing dataset. The middle panel illustrates predictions and 95% confidence intervals from a single-stage GP using a zero-mean prior. The right panel presents results from a two-stage GP approach. The Exact-GP and two-stage GP are both trained with same optimizer and learning rate. More details can be found in Appendix \ref{['appendix: supplementary Materials for sec2']}. Only 50% of the data are covered by the 95% confidence interval provided by Exact-GP, whereas our method covers 96.67% of the data.
Figure 3: Contour Plot of NLL for the UCI Wine Dataset: This plot illustrates the pairwise contours plot around the optimal lengthscale vs. noise. Optimal hyperparameters trained on 10%, 50%, 80%, and 100% datasets are marked with red dot , square , diamond , and cross , respectively. Lighter areas indicate higher NLL values, while darker areas signify lower NLL values. All the contour plots are plotted on full training datasets.
Figure 4: Uncertainty quantification results in RMSE.
Figure 5: Uncertainty quantification results in accuracy (%).
...and 2 more figures

Theorems & Definitions (35)

Theorem 1
Theorem 2
Remark 1
Theorem 3
proof : Proof of Theorem \ref{['thm: misspecification of mean']}
Theorem 4
Remark 2
Lemma 1: Proposition 3 form wang2022gaussian
Lemma 2: From Theorem 2 from chowdhury2017kernelized
Lemma 3: \ref{['https://math.stackexchange.com/questions/4621017/lower-bound-for-the-gaussian-tail']}
...and 25 more

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

TL;DR

Abstract

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (35)