Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees
Edgar Jaber, Vincent Blot, Nicolas Brunel, Vincent Chabridon, Emmanuel Remy, Bertrand Iooss, Didier Lucor, Mathilde Mougeot, Alessandro Leite
TL;DR
Gaussian process surrogates for expensive simulations rely on Gaussian-based credibility intervals that may misrepresent uncertainty under misspecification. The paper introduces cross-conformal predictors for GPs (J+GP and J-minmax-GP), weighting the non-conformity score by the GP posterior spread to yield adaptive, distribution-free prediction intervals with marginal coverage guarantees. The authors prove theoretical coverage properties, demonstrate strong adaptivity via correlation between interval width and local surrogate error, and provide a public implementation evaluated on ML benchmarks and an industrial nuclear-engineering use case. This approach offers a practical, reliability-enhancing tool for GP model evaluation and kernel selection in high-cost UQ settings, reducing dependence on Gaussian assumptions while preserving rigorous guarantees.
Abstract
Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaussianity of the simulation model as well as the well-specification of the priors which are not always appropriate. We propose to address this issue with the help of conformal prediction. In the present work, a method for building adaptive cross-conformal prediction intervals is proposed by weighting the non-conformity score with the posterior standard deviation of the GP. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets and display a significant correlation with the surrogate model local approximation error, while being free from the underlying model assumptions and having frequentist coverage guarantees. These estimators can thus be used for evaluating the quality of a GP surrogate model and can assist a decision-maker in the choice of the best prior for the specific application of the GP. The performance of the method is illustrated through a panel of numerical examples based on various reference databases. Moreover, the potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
