An Information Theoretic Perspective on Conformal Prediction
Alvaro H. C. Correia, Fabio Valerio Massoli, Christos Louizos, Arash Behboodi
TL;DR
This work links conformal prediction (CP) to information theory by bounding the intrinsic uncertainty $H(Y|X)$ via three approaches: a data-processing–based DPI bound and two Fano-type bounds (simple and model-based). These bounds are turned into differentiable training objectives, enabling end-to-end learning of classifiers from scratch and guiding CP efficiency toward narrower prediction sets; they also provide a principled way to incorporate side information. Empirical results in centralized and federated settings show that the proposed bounds yield smaller average prediction sets than competing methods, and that side information consistently improves efficiency. The approach unifies uncertainty quantification with information-theoretic tools, offering robust training signals and practical gains for CP-based uncertainty estimation in diverse tasks and distributed settings.
Abstract
Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.
