Table of Contents
Fetching ...

Robust Partial-Label Learning by Leveraging Class Activation Values

Tobias Fuchs, Florian Kalinke

TL;DR

The paper tackles robust learning under partial labels by introducing RobustPLL, a method that leverages class activation magnitudes within the subjective-logic framework to explicitly model uncertainty. It jointly learns a neural predictor and a set of label weights, with a Dirichlet-based loss that captures both fit to data and uncertainty, plus a regularization term that discourages evidence for noncandidate labels. A key contribution is a closed-form, optimal update rule for label weights that redistributes mass from noncandidate labels uniformly across remaining candidates, with a reinterpretation in terms of subjective logic beliefs, priors, and uncertainties. Empirically, RobustPLL demonstrates superior robustness to PLL noise, OOD data, and adversarial perturbations across MNIST-like and six real-world PLL datasets, supported by ablations and comparisons to a broad suite of baselines and ensembles.

Abstract

Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network's class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adversarial perturbations on the test instances.

Robust Partial-Label Learning by Leveraging Class Activation Values

TL;DR

The paper tackles robust learning under partial labels by introducing RobustPLL, a method that leverages class activation magnitudes within the subjective-logic framework to explicitly model uncertainty. It jointly learns a neural predictor and a set of label weights, with a Dirichlet-based loss that captures both fit to data and uncertainty, plus a regularization term that discourages evidence for noncandidate labels. A key contribution is a closed-form, optimal update rule for label weights that redistributes mass from noncandidate labels uniformly across remaining candidates, with a reinterpretation in terms of subjective logic beliefs, priors, and uncertainties. Empirically, RobustPLL demonstrates superior robustness to PLL noise, OOD data, and adversarial perturbations across MNIST-like and six real-world PLL datasets, supported by ablations and comparisons to a broad suite of baselines and ensembles.

Abstract

Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this context without manual data cleaning. While state-of-the-art methods have good predictive performance, their predictions are sensitive to high noise levels, out-of-distribution data, and adversarial perturbations. We propose a novel PLL method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network's class activation values. Thereby, we effectively incorporate prior knowledge about the class labels by using a novel label weight re-distribution strategy that we prove to be optimal. We empirically show that our method yields more robust predictions in terms of predictive performance under high PLL noise levels, handling out-of-distribution examples, and handling adversarial perturbations on the test instances.

Paper Structure

This paper contains 32 sections, 5 theorems, 20 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Proposition 4.2

Given instance $(\boldsymbol{x}_i, S_i) \in \mathop{\mathrm{\mathcal{D}}}\nolimits$, parameters $\boldsymbol{\theta}$, label weights $\boldsymbol{\mathop{\mathrm{\ell}}\nolimits}_{i}$, and $\bar{\boldsymbol{p}}_{i} = \boldsymbol{\alpha}_{i} / \| \boldsymbol{\alpha}_{i} \|_1$, it holds that $\mathop{

Figures (1)

  • Figure 1: Empirical CDF of the normalized entropy (range 0 to 1) of predictions on MNIST (darker color) and NotMNIST (lighter color) for models trained on MNIST. The left plot shows the four best non-ensemble approaches according to Table \ref{['tab:ood']} (highest metrics). We exclude methods that are too similar, for example, Proden-L2 and Rc behave similarly to Proden, which is shown. All methods' performances can be observed in Table \ref{['tab:ood']}. The right plot shows the predictive entropy of all four ensemble approaches. Our ensemble approach is most certain about predictions on the test set (top-left corner) while being one of the approaches that is the most uncertain about out-of-distribution examples (bottom-right corner).

Theorems & Definitions (7)

  • Example 3.1
  • Example 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 4.4
  • Proposition 4.5
  • Proposition 4.6