Table of Contents
Fetching ...

Generalization and Informativeness of Conformal Prediction

Matteo Zecchin, Sangwoo Park, Osvaldo Simeone, Fredrik Hellström

TL;DR

This work analyzes how the informativeness of conformal prediction (CP) sets is influenced by the generalization performance of the base predictor. It derives a finite-sample bound on the expected CP set size, tying it to the base model's generalization error, the calibration data size $n_{cal}$, and the target reliability $1-\\alpha$, via a KL-divergence based bound on the calibration and training distributions. The main result expresses the bound as an integral over the NC-score radius with an exponential term in $n_{cal}$ and a tail term at a threshold $R_{min}$, highlighting an exponential convergence in calibration size and a dependence on the base predictor quality. The theoretical insights are validated through MNIST classification and California housing regression experiments, demonstrating how calibration data and predictor performance jointly govern CP efficiency and providing practical guidance for allocating data between training and calibration to achieve informative, reliable predictions.

Abstract

The safe integration of machine learning modules in decision-making processes hinges on their ability to quantify uncertainty. A popular technique to achieve this goal is conformal prediction (CP), which transforms an arbitrary base predictor into a set predictor with coverage guarantees. While CP certifies the predicted set to contain the target quantity with a user-defined tolerance, it does not provide control over the average size of the predicted sets, i.e., over the informativeness of the prediction. In this work, a theoretical connection is established between the generalization properties of the base predictor and the informativeness of the resulting CP prediction sets. To this end, an upper bound is derived on the expected size of the CP set predictor that builds on generalization error bounds for the base predictor. The derived upper bound provides insights into the dependence of the average size of the CP set predictor on the amount of calibration data, the target reliability, and the generalization performance of the base predictor. The theoretical insights are validated using simple numerical regression and classification tasks.

Generalization and Informativeness of Conformal Prediction

TL;DR

This work analyzes how the informativeness of conformal prediction (CP) sets is influenced by the generalization performance of the base predictor. It derives a finite-sample bound on the expected CP set size, tying it to the base model's generalization error, the calibration data size , and the target reliability , via a KL-divergence based bound on the calibration and training distributions. The main result expresses the bound as an integral over the NC-score radius with an exponential term in and a tail term at a threshold , highlighting an exponential convergence in calibration size and a dependence on the base predictor quality. The theoretical insights are validated through MNIST classification and California housing regression experiments, demonstrating how calibration data and predictor performance jointly govern CP efficiency and providing practical guidance for allocating data between training and calibration to achieve informative, reliable predictions.

Abstract

The safe integration of machine learning modules in decision-making processes hinges on their ability to quantify uncertainty. A popular technique to achieve this goal is conformal prediction (CP), which transforms an arbitrary base predictor into a set predictor with coverage guarantees. While CP certifies the predicted set to contain the target quantity with a user-defined tolerance, it does not provide control over the average size of the predicted sets, i.e., over the informativeness of the prediction. In this work, a theoretical connection is established between the generalization properties of the base predictor and the informativeness of the resulting CP prediction sets. To this end, an upper bound is derived on the expected size of the CP set predictor that builds on generalization error bounds for the base predictor. The derived upper bound provides insights into the dependence of the average size of the CP set predictor on the amount of calibration data, the target reliability, and the generalization performance of the base predictor. The theoretical insights are validated using simple numerical regression and classification tasks.
Paper Structure (19 sections, 4 theorems, 51 equations, 4 figures)

This paper contains 19 sections, 4 theorems, 51 equations, 4 figures.

Key Result

Theorem 1

Under Assumption ass:nc and Assumption ass:gen_bound, the expected set size of the probabilistic CP predictor eq:cp_predictor satisfies the inequality with probability $1-\delta$ with respect to the random draw of the training data set $\mathcal{D}_{\text{tr}}$, where $\text{d}_{\text{KL}}\left(a||b\right)=a\log(a/b)+(1-a)\log((1-a)/(1-b))$ is the binary Kullback-Leibler divergence and we have de

Figures (4)

  • Figure 1: Conformal prediction (CP) set predictors (gray areas) obtained by calibrating a base predictor with a higher generalization error on the left and a lower generalization error on the right. Thanks to CP, both set predictors satisfy a user-defined coverage guarantee, but the inefficiency, i.e., the average prediction set size, is larger when the generalization error of the base predictor is larger.
  • Figure 2: Bound on the average set size \ref{['eq:final']} for different values of $n_{\text{tr}}$ and $n_{\text{cal}}$ as a function of the target reliability level $1-\alpha$. Increasing the number $n_{\text{cal}}$ of calibration data points causes the bound to converge exponentially fast to a function (black line) that is increasing in $1-\alpha$ and decreasing in the amount of training data $n_{\text{tr}}$.
  • Figure 3: Normalized empirical CP set size for a multi-class classification problem on the MNIST data set as a function of the reliability level $1-\alpha$ and for different sizes of the calibration and training data sets.
  • Figure 4: Normalized empirical CP set size for an $\ell_p$ regression task with $p=2$ on the California housing data set as a function of the reliability level $1-\alpha$ and for different sizes of the calibration and training data sets.

Theorems & Definitions (6)

  • Theorem 1
  • Lemma 1
  • Proposition 1
  • proof
  • Corollary 1
  • proof