Table of Contents
Fetching ...

Effects of label noise on the classification of outlier observations

Matheus Vinícius Barreto de Farias, Mario de Castro

TL;DR

This work evaluates BCOPS, a conformal-prediction approach that yields prediction sets with a nominal coverage of $1-\alpha$ (e.g., $0.95$) and can abstain when no training-class support exists, under training-label noise. The authors introduce a label-noise corruption function $g(y)$ with probability $\phi$ of flipping a label to a uniformly drawn class, and assess per-class coverage and outlier abstention across synthetic two-class and ten-class settings, plus MNIST (digits 0–5 trained, 6–9 as outliers). Across datasets, BCOPS maintains coverage near $0.95$ across noise levels, with occasional slight increases, and abstention on outliers decreases at low noise before recovering as noise grows. These results corroborate the robustness of conformal prediction under label-noise conditions while highlighting non-monotonic abstention behavior that practitioners should monitor in noisy data deployments. The findings support BCOPS’s applicability to real-world, imperfectly labeled data and inform considerations for abstention dynamics in outlier detection scenarios.

Abstract

This study investigates the impact of adding noise to the training set classes in classification tasks using the BCOPS algorithm (Balanced and Conformal Optimized Prediction Sets), proposed by Guan & Tibshirani (2022). The BCOPS algorithm is an application of conformal prediction combined with a machine learning method to construct prediction sets such that the probability of the true class being included in the prediction set for a test observation meets a specified coverage guarantee. An observation is considered an outlier if its true class is not present in the training set. The study employs both synthetic and real datasets and conducts experiments to evaluate the prediction abstention rate for outlier observations and the model's robustness in this previously untested scenario. The results indicate that the addition of noise, even in small amounts, can have a significant effect on model performance.

Effects of label noise on the classification of outlier observations

TL;DR

This work evaluates BCOPS, a conformal-prediction approach that yields prediction sets with a nominal coverage of (e.g., ) and can abstain when no training-class support exists, under training-label noise. The authors introduce a label-noise corruption function with probability of flipping a label to a uniformly drawn class, and assess per-class coverage and outlier abstention across synthetic two-class and ten-class settings, plus MNIST (digits 0–5 trained, 6–9 as outliers). Across datasets, BCOPS maintains coverage near across noise levels, with occasional slight increases, and abstention on outliers decreases at low noise before recovering as noise grows. These results corroborate the robustness of conformal prediction under label-noise conditions while highlighting non-monotonic abstention behavior that practitioners should monitor in noisy data deployments. The findings support BCOPS’s applicability to real-world, imperfectly labeled data and inform considerations for abstention dynamics in outlier detection scenarios.

Abstract

This study investigates the impact of adding noise to the training set classes in classification tasks using the BCOPS algorithm (Balanced and Conformal Optimized Prediction Sets), proposed by Guan & Tibshirani (2022). The BCOPS algorithm is an application of conformal prediction combined with a machine learning method to construct prediction sets such that the probability of the true class being included in the prediction set for a test observation meets a specified coverage guarantee. An observation is considered an outlier if its true class is not present in the training set. The study employs both synthetic and real datasets and conducts experiments to evaluate the prediction abstention rate for outlier observations and the model's robustness in this previously untested scenario. The results indicate that the addition of noise, even in small amounts, can have a significant effect on model performance.

Paper Structure

This paper contains 8 sections, 8 equations, 8 figures.

Figures (8)

  • Figure 1: Evolution of the coverage rate by class according to the level of noise in the training set for the dataset generated in Example \ref{['ex1']}.
  • Figure 2: Evolution of the abstention rate for outlier observations according to the level of noise in the training set for the dataset generated in Example \ref{['ex1']}.
  • Figure 3: Evolution of the average coverage rate according to the level of noise in the training set for the dataset generated in Example \ref{['ex2']}.
  • Figure 4: Evolution of the abstention rate for outlier observations according to the level of noise in the training set for the dataset generated in Example \ref{['ex2']}.
  • Figure 5: Class distribution in the MNIST training set after removing observations corresponding to digits $6$ through $9$.
  • ...and 3 more figures