Table of Contents
Fetching ...

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

Weiran Pan, Wei Wei, Feida Zhu, Yong Deng

TL;DR

This work tackles learning with noisy labels by addressing the shortcomings of small-loss sample selection. It introduces Confidence Tracking (CT), which tracks the temporal trends of prediction confidence gaps between annotated labels and other classes and uses the Mann-Kendall Trend Test to identify potentially correctly labeled samples, including those with high losses. CT acts as a plug-in to augment existing sample-selection methods, improving precision and recall by capturing training-dynamics signals that distinguish hard-to-learn clean samples from mislabeled ones. Empirical results across CIFAR-10/100, WebVision, and Food-101N show that CT consistently enhances performance when combined with state-of-the-art LNL methods, with particular strength under asymmetric and real-world noise. The approach offers a practical, scalable enhancement for robust learning in noisy-label scenarios and points to future work on deeper theoretical understanding of early learning dynamics.

Abstract

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently difficult for the model to learn and can exhibit high loss similar to mislabeled samples in the early stages of training. Consequently, setting a threshold on per-sample loss to select correct labels results in a trade-off between precision and recall in sample selection: a lower threshold may miss many correctly labeled hard-to-learn samples (low recall), while a higher threshold may include many mislabeled samples (low precision). To address this issue, our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones, thus alleviating the trade-off dilemma. We achieve this by considering the trends in model prediction confidence rather than relying solely on loss values. Empirical observations show that only for correctly labeled samples, the model's prediction confidence for the annotated labels typically increases faster than for any other classes. Based on this insight, we propose tracking the confidence gaps between the annotated labels and other classes during training and evaluating their trends using the Mann-Kendall Test. A sample is considered potentially correctly labeled if all its confidence gaps tend to increase. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques. Experiments on several standard benchmarks and real-world datasets demonstrate that our method enhances the performance of existing methods for learning with noisy labels.

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

TL;DR

This work tackles learning with noisy labels by addressing the shortcomings of small-loss sample selection. It introduces Confidence Tracking (CT), which tracks the temporal trends of prediction confidence gaps between annotated labels and other classes and uses the Mann-Kendall Trend Test to identify potentially correctly labeled samples, including those with high losses. CT acts as a plug-in to augment existing sample-selection methods, improving precision and recall by capturing training-dynamics signals that distinguish hard-to-learn clean samples from mislabeled ones. Empirical results across CIFAR-10/100, WebVision, and Food-101N show that CT consistently enhances performance when combined with state-of-the-art LNL methods, with particular strength under asymmetric and real-world noise. The approach offers a practical, scalable enhancement for robust learning in noisy-label scenarios and points to future work on deeper theoretical understanding of early learning dynamics.

Abstract

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently difficult for the model to learn and can exhibit high loss similar to mislabeled samples in the early stages of training. Consequently, setting a threshold on per-sample loss to select correct labels results in a trade-off between precision and recall in sample selection: a lower threshold may miss many correctly labeled hard-to-learn samples (low recall), while a higher threshold may include many mislabeled samples (low precision). To address this issue, our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones, thus alleviating the trade-off dilemma. We achieve this by considering the trends in model prediction confidence rather than relying solely on loss values. Empirical observations show that only for correctly labeled samples, the model's prediction confidence for the annotated labels typically increases faster than for any other classes. Based on this insight, we propose tracking the confidence gaps between the annotated labels and other classes during training and evaluating their trends using the Mann-Kendall Test. A sample is considered potentially correctly labeled if all its confidence gaps tend to increase. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques. Experiments on several standard benchmarks and real-world datasets demonstrate that our method enhances the performance of existing methods for learning with noisy labels.

Paper Structure

This paper contains 34 sections, 16 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of Confidence Tracking. We train a PreActResNet-18 model using cross-entropy loss and an SGD optimizer on CIFAR-10N-Worst (CIFAR-10 dataset with human-annotated real-world noisy labels, its noise rate is 40.21%). The left graph presents the average per-sample loss distribution in the first 30 epochs. We regard samples with an average loss greater than 1.2 (indicated by the dotted vertical line) as hard-to-learn ones and show the model prediction confidence trajectories on hard-to-learn dogs' images (middle graph) and mislabeled cats' images (right graph).
  • Figure 2: Left: Transition matrix of CIFAR-10N-Worst noisy labels. Right: Gradient alignment ( Equation \ref{['equation:alignment']}) for different types of samples when training a PreActResNet-18 using cross-entropy loss and an SGD optimizer on CIFAR-10N-Worst.
  • Figure 3: The sample selection results on the CIFAR-100N-noisy dataset. The left graph shows samples selected by both GMM and GMM+CT and the right graph shows samples selected by GMM+CT but rejected by GMM. These samples are chosen randomly, not cherry-picked.
  • Figure 4: Left graph presents the number of test samples in which DivideMix predictions are consistent with the annotations while DivideMix+CT gives different predictions. The right graph shows cases.
  • Figure 5: Gradient alignment when training a PreActResNet-18 net on CIFAR-10N-worst noisy labels when using GMM+CT as sample selector.
  • ...and 1 more figures