Table of Contents
Fetching ...

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

Zhilu Zhang, Mert R. Sabuncu

TL;DR

This work addresses the vulnerability of deep classifiers to noisy labels by introducing a parametric loss, $\mathcal{L}_q(f(x), e_j) = \frac{1 - f_j(x)^q}{q}$, which smoothly bridges categorical cross-entropy and mean absolute error, and a truncated variant to further enhance robustness. The authors provide theoretical bounds under label noise and show that the gradient of $\mathcal{L}_q$ weights samples by $f_{y}^{q-1}$, enabling a tunable balance between robustness and learnability; they also propose a practical ACS-based optimization with pruning for $\mathcal{L}_{trunc}$. Empirically, on CIFAR-10/100 and FASHION-MNIST with closed-set, open-set, and class-dependent noise, $\mathcal{L}_q$ and especially $\mathcal{L}_{trunc}$ consistently outperform standard losses (CCE, MAE), with truncation offering tighter bounds and improved stability. The results demonstrate that these loss functions are easy to implement as drop-in replacements and yield significant improvements in noisy-label regimes, suggesting broad utility for real-world, large-scale datasets where annotation quality varies.

Abstract

Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

TL;DR

This work addresses the vulnerability of deep classifiers to noisy labels by introducing a parametric loss, , which smoothly bridges categorical cross-entropy and mean absolute error, and a truncated variant to further enhance robustness. The authors provide theoretical bounds under label noise and show that the gradient of weights samples by , enabling a tunable balance between robustness and learnability; they also propose a practical ACS-based optimization with pruning for . Empirically, on CIFAR-10/100 and FASHION-MNIST with closed-set, open-set, and class-dependent noise, and especially consistently outperform standard losses (CCE, MAE), with truncation offering tighter bounds and improved stability. The results demonstrate that these loss functions are easy to implement as drop-in replacements and yield significant improvements in noisy-label regimes, suggesting broad utility for real-world, large-scale datasets where annotation quality varies.

Abstract

Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.

Paper Structure

This paper contains 11 sections, 5 theorems, 37 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

$\lim_{q \to 0} \mathcal{L}_{q}(f(\boldsymbol{x}), \boldsymbol{e}_j) = \mathcal{L}_{C}(f(\boldsymbol{x}), \boldsymbol{e}_j)$, where $\mathcal{L}_{q}$ represents the $\mathcal{L}_q$ loss, and $\mathcal{L}_{C}$ represents the categorical cross entropy loss.

Figures (2)

  • Figure 1: (a), (b) Test accuracy against number of epochs for training with CCE (orange) and MAE (blue) loss on clean data with (a) CIFAR-10 and (b) CIFAR-100 datasets. (c) Average softmax prediction for correctly (solid) and wrongly (dashed) labeled training samples, for CCE (orange) and $\mathcal{L}_q$ ($q=0.7$, blue) loss on CIFAR-10 with uniform noise ($\eta = 0.4$).
  • Figure 2: The test accuracy and validation loss against number of epochs for training with $\mathcal{L}_q$ loss at different values of $q$. (a) and (d): $\eta = 0.0$; (b) and (e): $\eta = 0.2$; (c) and (f): $\eta = 0.6$.

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Remark
  • Theorem 2
  • proof
  • Lemma 3
  • ...and 2 more