Table of Contents
Fetching ...

Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples

Suqin Yuan, Lei Feng, Bo Han, Tongliang Liu

TL;DR

The paper addresses learning with noisy labels by showing that mislabeled examples learned early can disproportionately harm generalization. It defines Mislabeled Easy Examples (MEEs) as mislabeled samples that the model predicts correctly early in training, which distort the learning of simple patterns. To mitigate MEEs, it proposes Early Cutting, a recalibration step that uses a later-stage model to reselect the confident subset initially identified, by filtering samples with high loss and high confidence and low input-gradient (MEEs) via thresholds on $L_i$, $c_i$, and $g_i$ and a rate $\gamma$. Empirically, on CIFAR-10/100, WebVision, and full ImageNet-1k with various noise types, Early Cutting improves performance over state-of-the-art methods with modest overhead and shows transferability (including to MixMatch-based semi-supervised setups); limitations include scope to vision tasks and a need for theoretical grounding.

Abstract

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.

Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples

TL;DR

The paper addresses learning with noisy labels by showing that mislabeled examples learned early can disproportionately harm generalization. It defines Mislabeled Easy Examples (MEEs) as mislabeled samples that the model predicts correctly early in training, which distort the learning of simple patterns. To mitigate MEEs, it proposes Early Cutting, a recalibration step that uses a later-stage model to reselect the confident subset initially identified, by filtering samples with high loss and high confidence and low input-gradient (MEEs) via thresholds on , , and and a rate . Empirically, on CIFAR-10/100, WebVision, and full ImageNet-1k with various noise types, Early Cutting improves performance over state-of-the-art methods with modest overhead and shows transferability (including to MixMatch-based semi-supervised setups); limitations include scope to vision tasks and a need for theoretical grounding.

Abstract

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.

Paper Structure

This paper contains 24 sections, 14 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) Test accuracy curves when the originally clean training subset is augmented with 4000 Mislabeled Easy Examples versus 4000 Mislabeled Hard Examples (see Section \ref{['sec2.1']} for setup). Adding Mislabeled Easy Examples leads to a larger decrease in the model’s generalization performance. (b) Histogram illustrating the distribution of ImageNet-1k examples with 40% symmetric label noise, showing the epoch at which each example is first correctly predicted by the model during training. The horizontal axis represents the epoch when examples are first correctly predicted, and the vertical axis represents the number of examples predicted correctly at each epoch.
  • Figure 2: Impact of mislabeled samples learned at different stages on model generalization performance. Subfigure \ref{['fig6']} shows the scenario in the CIFAR-10 dataset, which contains 20,000 mislabeled samples (40% instance-dependent label noise) and 30,000 clean samples. We divided the 20,000 mislabeled samples into five groups based on the order in which an initial model learned them—from earliest to latest (ranging from $(0:20,000]$). Each group was combined with the 30,000 clean samples, creating datasets with approximately 12% label noise ($4,000/34,000$). New models were then trained on these datasets. As shown by the decreasing test accuracy, models trained on datasets containing earlier-learned mislabeled samples (e.g., "Clean $+ (0:4000]$ Mislabeled") exhibited lower generalization performance. Subfigure \ref{['fig6b']} shows similar findings on CIFAR-100.
  • Figure 3: Comparison of how pretrained models learn mislabeled examples from different learning stages. Subfigure \ref{['fig7']} shows results on CIFAR-10 with 40% noise. We divided the mislabeled examples into five groups based on the order the initial model learned them, mixing each group with 30,000 clean examples to form datasets with approximately $12\%$ label noise ($4000/34000$). A model was pretrained on the 30,000 clean examples and then trained on these new noisy datasets. Reference lines indicate the number of epochs required for the pretrained model to learn different sets of 2,000 mislabeled examples. The results reveal that earlier-learned mislabeled examples are also learned more quickly by the robust model. Subfigure \ref{['fig7b']} shows similar findings on CIFAR-100.
  • Figure 4: \ref{['fig4:sub1']} Visualization of Mislabeled Easy Examples (MEEs) in the feature space. Top row: t-SNE embeddings of CIFAR-10 training samples (20% instance-dependent label noise), colored by their given labels (left) and their true labels (middle). Bottom left: a closer look at MEEs (red points) connected to their mislabeled class centers (black stars), demonstrating how these examples cluster in ambiguous regions that overlap with the mislabeled class. Bottom middle and right: comparisons of the distance ratio $r=d_{\text{mislabeled}}/d_{\text{true}}$ for MEEs and other mislabeled samples, confirming that they are indeed closer to incorrect wrong labels than their true labels in the learned feature space. \ref{['fig4:sub2']} Representative MEEs. Each image is shown with its true label (blue) and the mislabeled label (red).
  • Figure 5: Sensitivity analysis of hyperparameters on CIFAR-10 and CIFAR-100 with 40% symmetric label noise. In each subfigure, the left plot shows test accuracy versus thresholds for the large loss, high confidence, and low gradient norm criteria, scaled by factors of $\frac{1}{4}$, $\frac{1}{2}$, $1$, $2$, and $4$. The right plot shows test accuracy versus Early Cutting rate $\gamma$ set to $n^{1/3}$, $n^{1/2}$, $n$, $n^{2}$, and $n^{3}$, where $\gamma \geq 1$.
  • ...and 3 more figures