Elements of Conformal Prediction for Statisticians

Matteo Sesia; Stefano Favaro

Elements of Conformal Prediction for Statisticians

Matteo Sesia, Stefano Favaro

Abstract

Predictive inference is a fundamental task in statistics, traditionally addressed using parametric assumptions about the data distribution and detailed analyses of how models learn from data. In recent years, conformal prediction has emerged as a rapidly growing alternative framework that is particularly well suited to modern applications involving high-dimensional data and complex machine learning models. Its appeal stems from being both distribution-free -- relying mainly on symmetry assumptions such as exchangeability -- and model-agnostic, treating the learning algorithm as a black box. Even under such limited assumptions, conformal prediction provides exact finite-sample guarantees, though these are typically of a marginal nature that requires careful interpretation. This paper explains the core ideas of conformal prediction and reviews selected methods. Rather than offering an exhaustive survey, it aims to provide a clear conceptual entry point and a pedagogical overview of the field.

Elements of Conformal Prediction for Statisticians

Abstract

Paper Structure (58 sections, 5 theorems, 30 equations, 5 figures)

This paper contains 58 sections, 5 theorems, 30 equations, 5 figures.

Introduction
Significance Statement
Foundations
Exchangeability, Conformal Prediction Sets and $p$-Values
Exchangeable data
Motivating example: reference ranges for clinical laboratory tests
Uncertainty quantification via prediction sets
Prediction sets from tests of exchangeability
Nonconformity scores and conformal $p$-functions
Connection to pivots
Marginal coverage: strengths and limitations
Illustration: Predicting a Continuous Scalar Variable
Motivating example: one-sided clinical reference range for cardiac troponin
Construction using conformal $p$-values
Quantile-based characterization
...and 43 more sections

Key Result

Theorem 1

If $\mathbf{Z}_{1:(n+1)}$ are exchangeable, $\mathbb{P}\left[ p(Y_{n+1}; \mathbf{Z}_{1:n}, X_{n+1}) \le \alpha \right] \le \alpha$, $\forall \alpha \in (0,1)$.

Figures (5)

Figure 1: Schematic of split conformal prediction for binary classification. The data are randomly split into training and calibration subsets. A predictive model is trained on the training data. For each test input, nonconformity scores are computed for both hypothesized labels (blue circle and red square). These scores are compared to the calibration scores to evaluate the conformal $p$-function. The prediction set comprises labels whose conformal $p$-function exceeds the nominal level $\alpha$.
Figure 2: Conformal prediction intervals ($\alpha = 0.05$) for serum creatinine as a function of age and sex, using NHANES data restricted to a healthy reference population without self-reported kidney disease or pregnancy. Dots denote observed outcomes for a hold-out test set, and curves indicate the lower and upper prediction bounds. Left: intervals based on nonlinear mean regression; right: intervals based on quantile regression. The quantile-based approach adapts to heteroscedasticity.
Figure 3: Split conformal prediction sets ($\alpha = 0.05$) for diabetes classification using NHANES data (patients aged over 30). Test patients are plotted by their model-based predicted probability of diabetes (x-axis). Dashed vertical lines indicate the three distinct prediction regions: {Healthy}, {Healthy, Diabetes}, and {Diabetes}. Dots are colored and shaped by the true outcome, and shaded bands denote the proportion of true diabetes cases in each region. The test coverage is empirically at the desired level. A few important features for four representative patients are highlighted.
Figure 4: Illustrative simulation of one-sided prediction upper bounds for a continuous random variable at level $\alpha = 0.1$, under three different data-generating distributions. Two performance metrics are shown as a function of the sample size: marginal coverage (top) and excess upper bound relative to the ideal population oracle (bottom). The methods compared are conformal prediction, an empirical plug-in approach, and a normal asymptotic prediction interval. Each curve represents averages over $10{,}000$ independent simulations. Conformal prediction guarantees exact coverage and performs similarly to the oracle as the sample size grows.
Figure 5: Illustration of prediction sets for a categorical outcome at target level $\alpha = 0.1$ under three different data-generating distributions. The top panel shows the marginal coverage as a function of the sample size, and the bottom panel shows the excess size of each method relative to the population oracle. The methods compared are conformal prediction, a plug-in estimator, and a Bayesian approach with a uniform Dirichlet prior. Results are averaged over $10{,}000$ independent repetitions. Conformal prediction maintains finite-sample marginal coverage guarantees and approaches the oracle performance rapidly as the sample size grows.

Theorems & Definitions (8)

Theorem 1
proof
Theorem 2
Theorem 3
Theorem 4
proof
Theorem 5
proof

Elements of Conformal Prediction for Statisticians

Abstract

Elements of Conformal Prediction for Statisticians

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)