Table of Contents
Fetching ...

An Information Theoretic Perspective on Conformal Prediction

Alvaro H. C. Correia, Fabio Valerio Massoli, Christos Louizos, Arash Behboodi

TL;DR

This work links conformal prediction (CP) to information theory by bounding the intrinsic uncertainty $H(Y|X)$ via three approaches: a data-processing–based DPI bound and two Fano-type bounds (simple and model-based). These bounds are turned into differentiable training objectives, enabling end-to-end learning of classifiers from scratch and guiding CP efficiency toward narrower prediction sets; they also provide a principled way to incorporate side information. Empirical results in centralized and federated settings show that the proposed bounds yield smaller average prediction sets than competing methods, and that side information consistently improves efficiency. The approach unifies uncertainty quantification with information-theoretic tools, offering robust training signals and practical gains for CP-based uncertainty estimation in diverse tasks and distributed settings.

Abstract

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

An Information Theoretic Perspective on Conformal Prediction

TL;DR

This work links conformal prediction (CP) to information theory by bounding the intrinsic uncertainty via three approaches: a data-processing–based DPI bound and two Fano-type bounds (simple and model-based). These bounds are turned into differentiable training objectives, enabling end-to-end learning of classifiers from scratch and guiding CP efficiency toward narrower prediction sets; they also provide a principled way to incorporate side information. Empirical results in centralized and federated settings show that the proposed bounds yield smaller average prediction sets than competing methods, and that side information consistently improves efficiency. The approach unifies uncertainty quantification with information-theoretic tools, offering robust training signals and practical gains for CP-based uncertainty estimation in diverse tasks and distributed settings.

Abstract

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.
Paper Structure (46 sections, 14 theorems, 87 equations, 2 figures, 18 tables, 1 algorithm)

This paper contains 46 sections, 14 theorems, 87 equations, 2 figures, 18 tables, 1 algorithm.

Key Result

Theorem 2.1

If $\{(X_i, Y_i)\}_{i}^n$ are i.i.d. (or only exchangeable), then for a new i.i.d. draw $(X_{test}, Y_{test})$, and for any $\alpha \in (0,1)$ and for any score function $s$ such that $\{S_i\}_{i=1}^n$ are almost surely distinct, then ${\mathcal{C}}(X_{test})$ as defined above satisfies

Figures (2)

  • Figure 1: Graphical model of SCP. ${\mathcal{D}}_{cal}$ is a calibration set, ${\mathcal{C}}(X)$ the prediction set, $\hat{Y}=f(X)$ the model prediction, and $E$ the event $\{Y \in {\mathcal{C}}(X)\}$. Square and round nodes are, respectively, deterministic and stochastic functions of their parents.
  • Figure 2: Expected $[\log|C(X)|]^+$ as a function of $\alpha$.

Theorems & Definitions (35)

  • Theorem 2.1: vovk2005algorithmiclei2018distribution
  • Proposition 3.1: DPI Bound
  • Proposition 3.2: Model-Based Fano Bound
  • Corollary 3.1: Simple Fano Bound
  • Definition B.1: Exchangeable Random Variables
  • Definition B.2: Rank
  • Remark B.3
  • Lemma B.4
  • Corollary B.1
  • proof
  • ...and 25 more