Split Conformal Classification with Unsupervised Calibration
Santiago Mazuelas
TL;DR
The paper tackles the labeling cost barrier in split conformal prediction for classification by introducing unsupervised calibration that leverages unlabeled calibration samples together with labeled training data. It formulates label-weight optimization via minimizing an integral probability metric (IPM) between training and calibration distributions, enabling calibration without labels while preserving coverage guarantees. The framework provides both general and kernel-based implementations, with theoretical bounds that balance bias and variance through the chosen function class (e.g., RKHS) and sample sizes. Empirical results across nine datasets show that unsupervised calibration achieves performance close to conventional supervised calibration, with noticeable improvements in practicality for reducing labeling requirements, especially when the number of classes is moderate. The approach offers a principled, testable path to cost-efficient conformal prediction in settings where obtaining calibration labels is challenging or restricted.
Abstract
Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.
