Generalization and Informativeness of Conformal Prediction
Matteo Zecchin, Sangwoo Park, Osvaldo Simeone, Fredrik Hellström
TL;DR
This work analyzes how the informativeness of conformal prediction (CP) sets is influenced by the generalization performance of the base predictor. It derives a finite-sample bound on the expected CP set size, tying it to the base model's generalization error, the calibration data size $n_{cal}$, and the target reliability $1-\\alpha$, via a KL-divergence based bound on the calibration and training distributions. The main result expresses the bound as an integral over the NC-score radius with an exponential term in $n_{cal}$ and a tail term at a threshold $R_{min}$, highlighting an exponential convergence in calibration size and a dependence on the base predictor quality. The theoretical insights are validated through MNIST classification and California housing regression experiments, demonstrating how calibration data and predictor performance jointly govern CP efficiency and providing practical guidance for allocating data between training and calibration to achieve informative, reliable predictions.
Abstract
The safe integration of machine learning modules in decision-making processes hinges on their ability to quantify uncertainty. A popular technique to achieve this goal is conformal prediction (CP), which transforms an arbitrary base predictor into a set predictor with coverage guarantees. While CP certifies the predicted set to contain the target quantity with a user-defined tolerance, it does not provide control over the average size of the predicted sets, i.e., over the informativeness of the prediction. In this work, a theoretical connection is established between the generalization properties of the base predictor and the informativeness of the resulting CP prediction sets. To this end, an upper bound is derived on the expected size of the CP set predictor that builds on generalization error bounds for the base predictor. The derived upper bound provides insights into the dependence of the average size of the CP set predictor on the amount of calibration data, the target reliability, and the generalization performance of the base predictor. The theoretical insights are validated using simple numerical regression and classification tasks.
