Uncertainty Sets for Image Classifiers using Conformal Prediction
Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael I. Jordan
TL;DR
The paper addresses the challenge of quantifying uncertainty for image classifiers by wrapping any pre-trained model with conformal prediction to produce predictive sets that cover the true label with a user-specified probability. It introduces Regularized Adaptive Prediction Sets (RAPS), a simple, fast modification of Adaptive Prediction Sets (APS) that regularizes tail probabilities to yield smaller, more stable sets while maintaining finite-sample coverage guarantees. The approach is backed by theory showing conformal calibration guarantees and an optimality result relative to top-k schemes, and is validated on Imagenet and Imagenet-V2, where RAPS consistently reduces set size compared with naive and APS baselines. The work further develops adaptiveness metrics and automatic parameter tuning, arguing that RAPS provides a practical, scalable uncertainty quantification tool for high-dimensional image classification tasks with potential use in critical decision-making contexts.
Abstract
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finite-sample coverage guarantee for every model and dataset. Our method modifies an existing conformal prediction algorithm to give more stable predictive sets by regularizing the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and Imagenet-V2 with ResNet-152 and other classifiers, our scheme outperforms existing approaches, achieving coverage with sets that are often factors of 5 to 10 smaller than a stand-alone Platt scaling baseline.
