Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
Zhilu Zhang, Mert R. Sabuncu
TL;DR
This work addresses the vulnerability of deep classifiers to noisy labels by introducing a parametric loss, $\mathcal{L}_q(f(x), e_j) = \frac{1 - f_j(x)^q}{q}$, which smoothly bridges categorical cross-entropy and mean absolute error, and a truncated variant to further enhance robustness. The authors provide theoretical bounds under label noise and show that the gradient of $\mathcal{L}_q$ weights samples by $f_{y}^{q-1}$, enabling a tunable balance between robustness and learnability; they also propose a practical ACS-based optimization with pruning for $\mathcal{L}_{trunc}$. Empirically, on CIFAR-10/100 and FASHION-MNIST with closed-set, open-set, and class-dependent noise, $\mathcal{L}_q$ and especially $\mathcal{L}_{trunc}$ consistently outperform standard losses (CCE, MAE), with truncation offering tighter bounds and improved stability. The results demonstrate that these loss functions are easy to implement as drop-in replacements and yield significant improvements in noisy-label regimes, suggesting broad utility for real-world, large-scale datasets where annotation quality varies.
Abstract
Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.
