Uncertainty Quantification of Surrogate Models using Conformal Prediction
Vignesh Gopakumar, Ander Gray, Joel Oskarsson, Lorenzo Zanisi, Daniel Giles, Matt J. Kusner, Stanislas Pamela, Marc Peter Deisenroth
TL;DR
This work develops a model-agnostic conformal prediction framework to quantify uncertainty in data-driven spatio-temporal surrogates for complex physical systems. By performing cell-wise calibration on tensor outputs, it delivers statistically valid marginal coverage across space and time with near-zero calibration cost, across deterministic and probabilistic models, and even under out-of-distribution deployment within exchangeability assumptions. The authors audit CP with three nonconformity scores (CQR, AER, STD) over a wide suite of tasks—1D and 2D PDEs, Navier–Stokes and MHD plasmas, foundation physics models, and neural weather prediction—demonstrating robust coverage up to tens of millions of output dimensions. They further discuss exchangeability requirements, practical limitations (marginal vs conditional coverage, independence across cells, and potential distribution shifts), and provide guidelines for using CP to validate pre-trained surrogates for safety-critical inference with minimal retraining. Overall, CP emerges as a scalable, principled tool for trustworthy deployment of scientific ML models where confident uncertainty quantification is essential but traditional UQ methods are prohibitive.
Abstract
Data-driven surrogate models offer quick approximations to complex numerical and experimental systems but typically lack uncertainty quantification, limiting their reliability in safety-critical applications. While Bayesian methods provide uncertainty estimates, they offer no statistical guarantees and struggle with high-dimensional spatio-temporal problems due to computational costs. We present a conformal prediction (CP) framework that provides statistically guaranteed marginal coverage for surrogate models in a model-agnostic manner with near-zero computational cost. Our approach handles high-dimensional spatio-temporal outputs by performing cell-wise calibration while preserving the tensorial structure of predictions. Through extensive empirical evaluation across diverse applications including fluid dynamics, magnetohydrodynamics, weather forecasting, and fusion diagnostics, we demonstrate that CP achieves empirical coverage with valid error bars regardless of model architecture, training regime, or output dimensionality. We evaluate three nonconformity scores (conformalised quantile regression, absolute error residual, and standard deviation) for both deterministic and probabilistic models, showing that guaranteed coverage holds even for out-of-distribution predictions where models are deployed on physics regimes different from training data. Calibration requires only seconds to minutes on standard hardware. The framework enables rigorous validation of pre-trained surrogate models for downstream applications without retraining. While CP provides marginal rather than conditional coverage and assumes exchangeability between calibration and test data, our method circumvents the curse of dimensionality inherent in traditional uncertainty quantification approaches, offering a practical tool for trustworthy deployment of machine learning in physical sciences.
