Table of Contents
Fetching ...

Partial-Label Learning with Conformal Candidate Cleaning

Tobias Fuchs, Florian Kalinke

TL;DR

This work tackles the ambiguity inherent in real-world data by integrating conformal prediction into partial-label learning to prune candidate labels. The authors introduce a Conformal Candidate Cleaning procedure that alternates empirical risk minimization for PLL with pruning of candidate sets via conformal prediction calibrated on a pseudo-labeled validation set, while proving that the conformal validity is preserved with respect to the true labels. The approach yields significant performance gains across multiple PLL baselines and datasets, supported by a theoretical justification under mild assumptions and an extensive ablation study. The method is practical, scalable, and accompanied by open-source code, offering a principled way to reduce candidate-label noise and improve PLL accuracy in real-world applications.

Abstract

Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth. Our extensive experiments on artificial and real-world data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers.

Partial-Label Learning with Conformal Candidate Cleaning

TL;DR

This work tackles the ambiguity inherent in real-world data by integrating conformal prediction into partial-label learning to prune candidate labels. The authors introduce a Conformal Candidate Cleaning procedure that alternates empirical risk minimization for PLL with pruning of candidate sets via conformal prediction calibrated on a pseudo-labeled validation set, while proving that the conformal validity is preserved with respect to the true labels. The approach yields significant performance gains across multiple PLL baselines and datasets, supported by a theoretical justification under mild assumptions and an extensive ablation study. The method is practical, scalable, and accompanied by open-source code, offering a principled way to reduce candidate-label noise and improve PLL accuracy in real-world applications.

Abstract

Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth. Our extensive experiments on artificial and real-world data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers.

Paper Structure

This paper contains 30 sections, 8 theorems, 42 equations, 1 figure, 6 tables, 1 algorithm.

Key Result

Theorem 4.1

Assume that $\mathop{\mathrm{\mathbb{P}}}\nolimits_{S \mid X = x, Y = y}(y \in S) = 1$, for any $(x, y) \in \mathop{\mathrm{\mathcal{X}}}\nolimits \times \mathop{\mathrm{\mathcal{Y}}}\nolimits$, and $\alpha \in (0, 1)$. Then, an optimal solution $C$ of eq:opt-supervised satisfies eq:pll-valid: $\mat

Figures (1)

  • Figure : Conformal Candidate Cleaning

Theorems & Definitions (10)

  • Theorem 4.1
  • Lemma 4.3
  • Theorem 4.4
  • Remark 4.5
  • Lemma B.1
  • proof
  • Theorem C.1: DKW56Naaman21
  • Theorem C.3: FengL0X0G0S20
  • Theorem C.4: FengL0X0G0S20
  • Lemma C.5: Markov inequality