Adaptive conformal classification with noisy labels
Matteo Sesia, Y. X. Rachel Wang, Xin Tong
TL;DR
The paper addresses conformal prediction under label noise in calibration data by introducing adaptive calibration procedures that account for a contamination process modeled via a linear mixing framework. By characterizing the coverage inflation factor $oldsymbol{ riangle}_k(t)$ and constructing plug-in or CI-based corrections, the authors obtain valid prediction sets that are often more informative than standard conformal sets, across label-conditional and marginal guarantees. The methodology covers known and bounded contamination and includes procedures to estimate the contamination model from data, with theoretical finite-sample guarantees and practical demonstrations on simulations and CIFAR-10H. Overall, this work advances uncertainty quantification in noisy-label regimes, with practical impact for crowdsourced labeling, privacy-preserving data collection, and real-world datasets with imperfect annotations.
Abstract
This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, leading to more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge of the underlying data distribution or of the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.
