Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

Weiran Pan; Wei Wei; Feida Zhu; Yong Deng

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

Weiran Pan, Wei Wei, Feida Zhu, Yong Deng

TL;DR

This work tackles learning with noisy labels by addressing the shortcomings of small-loss sample selection. It introduces Confidence Tracking (CT), which tracks the temporal trends of prediction confidence gaps between annotated labels and other classes and uses the Mann-Kendall Trend Test to identify potentially correctly labeled samples, including those with high losses. CT acts as a plug-in to augment existing sample-selection methods, improving precision and recall by capturing training-dynamics signals that distinguish hard-to-learn clean samples from mislabeled ones. Empirical results across CIFAR-10/100, WebVision, and Food-101N show that CT consistently enhances performance when combined with state-of-the-art LNL methods, with particular strength under asymmetric and real-world noise. The approach offers a practical, scalable enhancement for robust learning in noisy-label scenarios and points to future work on deeper theoretical understanding of early learning dynamics.

Abstract

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently difficult for the model to learn and can exhibit high loss similar to mislabeled samples in the early stages of training. Consequently, setting a threshold on per-sample loss to select correct labels results in a trade-off between precision and recall in sample selection: a lower threshold may miss many correctly labeled hard-to-learn samples (low recall), while a higher threshold may include many mislabeled samples (low precision). To address this issue, our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones, thus alleviating the trade-off dilemma. We achieve this by considering the trends in model prediction confidence rather than relying solely on loss values. Empirical observations show that only for correctly labeled samples, the model's prediction confidence for the annotated labels typically increases faster than for any other classes. Based on this insight, we propose tracking the confidence gaps between the annotated labels and other classes during training and evaluating their trends using the Mann-Kendall Test. A sample is considered potentially correctly labeled if all its confidence gaps tend to increase. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques. Experiments on several standard benchmarks and real-world datasets demonstrate that our method enhances the performance of existing methods for learning with noisy labels.

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

TL;DR

Abstract

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)