Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches
Aurélien Pion, Emmanuel Vazquez
TL;DR
This work tackles the challenge of calibrating Gaussian-process predictive distributions under interpolation by introducing design-marginal notions of calibration (μ-calibration). It develops two calibration frameworks: cps-gp, a conformal-prediction-based method yielding a distribution-free, marginally calibrated CPD for GP interpolation, and bcr-gp, a Bayesian post-processing approach that preserves the GP mean but recalibrates dispersion via a generalized normal residual model. The methods are compared against existing conformal techniques (Jackknife+ and full conformal GP) using μ-coverage, PIT-based diagnostics, and proper scoring rules, showing improved calibration and usable predictive distributions for sequential design. The paper also provides extensive theoretical results, finite-sample considerations, and practical guidance on parameter selection, design-size effects, and tail calibration, highlighting the trade-offs between model-based calibration and distribution-free guarantees. Overall, cps-gp and bcr-gp offer complementary tools to enhance the reliability of GP-based uncertainty quantification in design and optimization tasks.
Abstract
We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a design measure μ, we formalize μ-coverage for central intervals and μ-probabilistic calibration through randomized probability integral transforms. We introduce two methods. cps-gp adapts conformal predictive systems to GP interpolation using standardized leave-one-out residuals, yielding stepwise predictive distributions with finite-sample marginal calibration. bcr-gp retains the GP posterior mean and replaces the Gaussian residual by a generalized normal model fitted to cross-validated standardized residuals. A Bayesian selection rule-based either on a posterior upper quantile of the variance for conservative prediction or on a cross-posterior Kolmogorov-Smirnov criterion for probabilistic calibration-controls dispersion and tail behavior while producing smooth predictive distributions suitable for sequential design. Numerical experiments on benchmark functions compare cps-gp, bcr-gp, Jackknife+ for GPs, and the full conformal Gaussian process, using calibration metrics (coverage, Kolmogorov-Smirnov, integral absolute error) and accuracy or sharpness through the scaled continuous ranked probability score.
