Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Mingyuan Zhang; Shivani Agarwal

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Mingyuan Zhang, Shivani Agarwal

TL;DR

This paper tackles multiclass learning from noisy labels when evaluating non-decomposable performance measures such as Micro $F_1$, H-mean, Q-mean, and G-mean under class-conditional noise. It develops two noise-corrected algorithms: NCFW for monotonic convex measures via Frank-Wolfe and NCBS for ratio-of-linear measures via Bisection, leveraging noise-corrected surrogates ${\mathbf L}' = ({\mathbf T}^T)^{-1}{\mathbf L}$ and corrected confusion matrices ${\mathbf C}^D[h] = {\mathbf T}^{-1}\widehat{{\mathbf C}}^{\widetilde S}[h]$. The authors prove Bayes-consistency and regret bounds that quantify how noise (via ${\mathbf T}^{-1}$) and estimation error in the class-probability estimator affect learning, and they extend the results to cases where ${\mathbf T}$ is unknown and estimated ${\widehat{\mathbf T}}$. Empirical results on synthetic and real datasets demonstrate improved performance and sample efficiency over standard noisy-label baselines, confirming practical effectiveness in noisy-label regimes.

Abstract

There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro $F_1$ in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise.

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

TL;DR

This paper tackles multiclass learning from noisy labels when evaluating non-decomposable performance measures such as Micro

, H-mean, Q-mean, and G-mean under class-conditional noise. It develops two noise-corrected algorithms: NCFW for monotonic convex measures via Frank-Wolfe and NCBS for ratio-of-linear measures via Bisection, leveraging noise-corrected surrogates

and corrected confusion matrices

. The authors prove Bayes-consistency and regret bounds that quantify how noise (via

) and estimation error in the class-probability estimator affect learning, and they extend the results to cases where

is unknown and estimated

. Empirical results on synthetic and real datasets demonstrate improved performance and sample efficiency over standard noisy-label baselines, confirming practical effectiveness in noisy-label regimes.

Abstract

in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise.

Paper Structure (16 sections, 13 theorems, 27 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 13 theorems, 27 equations, 3 figures, 6 tables, 2 algorithms.

INTRODUCTION
Related Work
Organization and Notation
PRELIMINARIES AND BACKGROUND
MONOTONIC CONVEX PERFORMANCE MEASURES
RATIO-OF-LINEAR PERFORMANCE MEASURES
CONSISTENCY AND REGRET BOUNDS
EXPERIMENTS
CONCLUSION
Proofs
Proofs for Section \ref{['sec:monotonic']}
Proofs for Section \ref{['sec:ratio-of-linear']}
Proofs for Section \ref{['sec:cons']}
Synthetic Data: Additional Details
Real Data: Additional Details and Experiments
...and 1 more sections

Key Result

Proposition 8

Let ${\mathbf L}' = ({\mathbf T}^\top)^{-1} {\mathbf L}$. Then any Bayes optimal classifier for ${\mathbf L}'$-performance w.r.t. $\widetilde{D}$ is also Bayes optimal for ${\mathbf L}$-performance w.r.t. $D$.

Figures (3)

Figure 1: Sample Complexity Behavior of Our Noise-corrected Algorithms NCFW (top) and NCBS (bottom)
Figure : Noise-Corrected Frank-Wolfe (NCFW) Based Algorithm for Monotonic Convex Performance Measures (See Section \ref{['sec:monotonic']} for details.)
Figure : Noise-Corrected Bisection (NCBS) Based Algorithm for Ratio-of-linear Performance Measures (See Section \ref{['sec:ratio-of-linear']} for details.)

Theorems & Definitions (37)

Definition 1: Class-conditional noise matrix
Definition 2: Confusion matrix
Definition 3: Performance measure
Example 1: ${\mathbf L}$-performance measures
Definition 4: Feasible confusion matrices
Definition 5: Bayes optimal $\psi$-performance
Definition 6: Monotonic convex performance measures
Example 2: H-mean, Q-mean and G-mean, all in loss forms
Definition 7: Class probability function, class probability for short
Proposition 8
...and 27 more

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

TL;DR

Abstract

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (37)