Another Fit Bites the Dust: Conformal Prediction as a Calibration Standard for Machine Learning in High-Energy Physics
Jack Y. Araz, Michael Spannowsky
TL;DR
The paper argues that conformal prediction provides a distribution-free, finite-sample calibration layer for diverse ML tasks in high-energy physics. It demonstrates how CP can convert arbitrary model outputs into calibrated prediction sets, p-values, or typicality regions across regression, binary and multiclass classification, anomaly detection, and generative modelling, using public collider datasets. CP yields rigorous uncertainty quantification without retraining or altering underlying models, enabling honest error control and robust comparisons. The authors advocate adopting CP as a standard post-processing step in collider ML pipelines to improve interpretability and decision-making under controlled error rates.
Abstract
Machine-learning techniques are essential in modern collider research, yet their probabilistic outputs often lack calibrated uncertainty estimates and finite-sample guarantees, limiting their direct use in statistical inference and decision-making. Conformal prediction (CP) provides a simple, distribution-free framework for calibrating arbitrary predictive models without retraining, yielding rigorous uncertainty quantification with finite-sample coverage guarantees under minimal exchangeability assumptions, without reliance on asymptotics, limit theorems, or Gaussian approximations. In this work, we investigate CP as a unifying calibration layer for machine-learning applications in high-energy physics. Using publicly available collider datasets and a diverse set of models, we show that a single conformal formalism can be applied across regression, binary and multi-class classification, anomaly detection, and generative modelling, converting raw model outputs into statistically valid prediction sets, typicality regions, and p-values with controlled false-positive rates. While conformal prediction does not improve raw model performance, it enforces honest uncertainty quantification and transparent error control. We argue that conformal calibration should be adopted as a standard component of machine-learning pipelines in collider physics, enabling reliable interpretation, robust comparisons, and principled statistical decisions in experimental and phenomenological analyses.
