Table of Contents
Fetching ...

Split Conformal Classification with Unsupervised Calibration

Santiago Mazuelas

TL;DR

The paper tackles the labeling cost barrier in split conformal prediction for classification by introducing unsupervised calibration that leverages unlabeled calibration samples together with labeled training data. It formulates label-weight optimization via minimizing an integral probability metric (IPM) between training and calibration distributions, enabling calibration without labels while preserving coverage guarantees. The framework provides both general and kernel-based implementations, with theoretical bounds that balance bias and variance through the chosen function class (e.g., RKHS) and sample sizes. Empirical results across nine datasets show that unsupervised calibration achieves performance close to conventional supervised calibration, with noticeable improvements in practicality for reducing labeling requirements, especially when the number of classes is moderate. The approach offers a principled, testable path to cost-efficient conformal prediction in settings where obtaining calibration labels is challenging or restricted.

Abstract

Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.

Split Conformal Classification with Unsupervised Calibration

TL;DR

The paper tackles the labeling cost barrier in split conformal prediction for classification by introducing unsupervised calibration that leverages unlabeled calibration samples together with labeled training data. It formulates label-weight optimization via minimizing an integral probability metric (IPM) between training and calibration distributions, enabling calibration without labels while preserving coverage guarantees. The framework provides both general and kernel-based implementations, with theoretical bounds that balance bias and variance through the chosen function class (e.g., RKHS) and sample sizes. Empirical results across nine datasets show that unsupervised calibration achieves performance close to conventional supervised calibration, with noticeable improvements in practicality for reducing labeling requirements, especially when the number of classes is moderate. The approach offers a principled, testable path to cost-efficient conformal prediction in settings where obtaining calibration labels is challenging or restricted.

Abstract

Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.

Paper Structure

This paper contains 18 sections, 6 theorems, 66 equations, 12 figures, 7 tables, 3 algorithms.

Key Result

Theorem 1

Let $\mathbf{w}$ be label weights with objective value in opt-gen upper bounded by $V_{\text{opt}}$ (i.e., $\Phi^\mathcal{F}_{n,m}( \mathbf{w} )\leq V_{\text{opt}}$). For any data distribution, type of conformal score, and target coverage $1-\alpha$, the set-prediction rule $\mathcal{C}$ from Algori with probability at least $1-\delta$, for $G_{n,m}^{\mathcal{F}}$ given by In addition, if the val

Figures (12)

  • Figure 1: Coverage probabilities and prediction set sizes over $400$ random partitions of 'CIFAR10' dataset with target coverage $1 - \alpha = 0.9$ and number of calibration samples ranging from $100$ to $2,000$. The presented methods with unsupervised calibration achieve performance that is only slightly inferior to that of conventional methods with supervised calibration (coverage probabilities slightly less centered around $0.9$). With both approaches, the coverage gap between the actual and target coverage probabilities decreases with more calibration samples and remains below $0.02$ with high probability using a few thousand calibration samples.
  • Figure 2: Decrease of average coverage gap with the number of calibration samples.
  • Figure 3: Running times of the methods presented as the number of calibration samples increases. For datasets with up to 10 classes, the running times range from tens to hundreds of seconds using few thousands of samples.
  • Figure 4: Coverage probabilities and prediction set sizes over $400$ random partitions of 'ImageNet10' dataset with target coverage $1 - \alpha = 0.9$ and number of calibration samples ranging from $100$ to $2,000$.
  • Figure 5: Coverage probabilities and prediction set sizes over $400$ random partitions of 'FashionMNIST' dataset with target coverage $1 - \alpha = 0.9$ and number of calibration samples ranging from $100$ to $2,000$.
  • ...and 7 more figures

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Proposition 1
  • proof
  • ...and 2 more