Table of Contents
Fetching ...

Label Noise Robustness of Conformal Prediction

Bat-Sheva Einbinder, Shai Feldman, Stephen Bates, Anastasios N. Angelopoulos, Asaf Gendler, Yaniv Romano

TL;DR

The robustness of conformal prediction to label noise is studied and it is suggested that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability.

Abstract

We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly controlling a general loss function, such as the false negative proportion, with noisy labels. Our theory and experiments suggest that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability. In other adversarial cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure achieving the correct risk of the ground truth labels without score or data regularity.

Label Noise Robustness of Conformal Prediction

TL;DR

The robustness of conformal prediction to label noise is studied and it is suggested that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability.

Abstract

We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly controlling a general loss function, such as the false negative proportion, with noisy labels. Our theory and experiments suggest that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability. In other adversarial cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure achieving the correct risk of the ground truth labels without score or data regularity.
Paper Structure (66 sections, 27 theorems, 144 equations, 23 figures, 2 tables, 3 algorithms)

This paper contains 66 sections, 27 theorems, 144 equations, 23 figures, 2 tables, 3 algorithms.

Key Result

Theorem 2.1

Assume that $\mathbb{P}(\tilde{s}_{\rm test} \leq t) \leq \mathbb{P}(s_{\rm test} \leq t)$ for all $t$. Then, Furthermore, for any $u$ satisfying $\mathbb{P}(\tilde{s}_{\rm test} \leq t) + u \geq \mathbb{P}(s_{\rm test} \leq t),$ for all $t$, then

Figures (23)

  • Figure 1: Effect of label noise on CIFAR-10. Left: distribution of average coverage on a clean test set over 30 independent experiments with target coverage $1-\alpha = 90\%$, using noisy and clean labels for calibration. We use a pre-trained resnet 18 model, which has Top-1 accuracy of 93% and 90% on the clean and noisy test set, respectively. The gray bar represents the interquartile range. Center and right: prediction sets achieved using noisy and clean labels for calibration.
  • Figure 2: Results for real-data regression experiment: predicting aesthetic visual rating. Performance of conformal prediction intervals with 90% marginal coverage based on a VGG-16 model using a noisy training set. We compare the residual magnitude score and CQR methods with both noisy and clean calibration sets. Left: Marginal coverage; Right: Interval length. The results are evaluated over 30 independent experiments and the gray bar represents the interquartile range.
  • Figure 3: Clean (green) and noisy (red) non-conformity scores under dispersive corruption.
  • Figure 4: Clean (green) and noisy (red) class probabilities under dispersive corruption.
  • Figure 5: FNR on MS COCO data set, achieved over noisy (red) and clean (green) test sets. The calibration scheme is applied with noisy annotations. Results are averaged over 2000 trials.
  • ...and 18 more figures

Theorems & Definitions (30)

  • Theorem 2.1
  • Proposition 2.1
  • Remark 2.2: Corruptions dependent on X
  • Proposition 2.2
  • Corollary 1
  • Corollary 2
  • Proposition 2.3: Coverage is impossible in the general case.
  • Corollary 3: Corollary of barber2022conformal
  • Corollary 4: Corollary of barber2022conformal Theorem 3
  • Proposition 3.1
  • ...and 20 more