Table of Contents
Fetching ...

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Mingyuan Zhang, Shivani Agarwal

TL;DR

This paper tackles multiclass learning from noisy labels when evaluating non-decomposable performance measures such as Micro $F_1$, H-mean, Q-mean, and G-mean under class-conditional noise. It develops two noise-corrected algorithms: NCFW for monotonic convex measures via Frank-Wolfe and NCBS for ratio-of-linear measures via Bisection, leveraging noise-corrected surrogates ${\mathbf L}' = ({\mathbf T}^T)^{-1}{\mathbf L}$ and corrected confusion matrices ${\mathbf C}^D[h] = {\mathbf T}^{-1}\widehat{{\mathbf C}}^{\widetilde S}[h]$. The authors prove Bayes-consistency and regret bounds that quantify how noise (via ${\mathbf T}^{-1}$) and estimation error in the class-probability estimator affect learning, and they extend the results to cases where ${\mathbf T}$ is unknown and estimated ${\widehat{\mathbf T}}$. Empirical results on synthetic and real datasets demonstrate improved performance and sample efficiency over standard noisy-label baselines, confirming practical effectiveness in noisy-label regimes.

Abstract

There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro $F_1$ in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise.

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

TL;DR

This paper tackles multiclass learning from noisy labels when evaluating non-decomposable performance measures such as Micro , H-mean, Q-mean, and G-mean under class-conditional noise. It develops two noise-corrected algorithms: NCFW for monotonic convex measures via Frank-Wolfe and NCBS for ratio-of-linear measures via Bisection, leveraging noise-corrected surrogates and corrected confusion matrices . The authors prove Bayes-consistency and regret bounds that quantify how noise (via ) and estimation error in the class-probability estimator affect learning, and they extend the results to cases where is unknown and estimated . Empirical results on synthetic and real datasets demonstrate improved performance and sample efficiency over standard noisy-label baselines, confirming practical effectiveness in noisy-label regimes.

Abstract

There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise.
Paper Structure (16 sections, 13 theorems, 27 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 13 theorems, 27 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Proposition 8

Let ${\mathbf L}' = ({\mathbf T}^\top)^{-1} {\mathbf L}$. Then any Bayes optimal classifier for ${\mathbf L}'$-performance w.r.t. $\widetilde{D}$ is also Bayes optimal for ${\mathbf L}$-performance w.r.t. $D$.

Figures (3)

  • Figure 1: Sample Complexity Behavior of Our Noise-corrected Algorithms NCFW (top) and NCBS (bottom)
  • Figure : Noise-Corrected Frank-Wolfe (NCFW) Based Algorithm for Monotonic Convex Performance Measures (See Section \ref{['sec:monotonic']} for details.)
  • Figure : Noise-Corrected Bisection (NCBS) Based Algorithm for Ratio-of-linear Performance Measures (See Section \ref{['sec:ratio-of-linear']} for details.)

Theorems & Definitions (37)

  • Definition 1: Class-conditional noise matrix
  • Definition 2: Confusion matrix
  • Definition 3: Performance measure
  • Example 1: ${\mathbf L}$-performance measures
  • Definition 4: Feasible confusion matrices
  • Definition 5: Bayes optimal $\psi$-performance
  • Definition 6: Monotonic convex performance measures
  • Example 2: H-mean, Q-mean and G-mean, all in loss forms
  • Definition 7: Class probability function, class probability for short
  • Proposition 8
  • ...and 27 more