Table of Contents
Fetching ...

Early Stopping Against Label Noise Without Validation Data

Suqin Yuan, Lei Feng, Tongliang Liu

TL;DR

This work tackles the problem of early stopping under label noise without relying on validation data. It proposes Label Wave, a method that uses the prediction changes (PC) on the training set and its moving-average smoothed version to identify the first local minimum as the early stopping point, thereby preventing overfitting to mislabeled samples. The authors formalize stability and variability metrics, reveal a transitional phase called learning confusing patterns, and validate the approach across diverse datasets, architectures, and noise types, showing improvements over traditional hold-out validation and enhancements to existing noisy-label methods. The method offers a practical, data-efficient solution with strong empirical support and suggests further exploration of learning dynamics in noisy settings.

Abstract

Early stopping methods in deep learning face the challenge of balancing the volume of training and validation data, especially in the presence of label noise. Concretely, sparing more data for validation from training data would limit the performance of the learned model, yet insufficient validation data could result in a sub-optimal selection of the desired model. In this paper, we propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model in the presence of label noise. It works by tracking the changes in the model's predictions on the training set during the training process, aiming to halt training before the model unduly fits mislabeled data. This method is empirically supported by our observation that minimum fluctuations in predictions typically occur at the training epoch before the model excessively fits mislabeled data. Through extensive experiments, we show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.

Early Stopping Against Label Noise Without Validation Data

TL;DR

This work tackles the problem of early stopping under label noise without relying on validation data. It proposes Label Wave, a method that uses the prediction changes (PC) on the training set and its moving-average smoothed version to identify the first local minimum as the early stopping point, thereby preventing overfitting to mislabeled samples. The authors formalize stability and variability metrics, reveal a transitional phase called learning confusing patterns, and validate the approach across diverse datasets, architectures, and noise types, showing improvements over traditional hold-out validation and enhancements to existing noisy-label methods. The method offers a practical, data-efficient solution with strong empirical support and suggests further exploration of learning dynamics in noisy settings.

Abstract

Early stopping methods in deep learning face the challenge of balancing the volume of training and validation data, especially in the presence of label noise. Concretely, sparing more data for validation from training data would limit the performance of the learned model, yet insufficient validation data could result in a sub-optimal selection of the desired model. In this paper, we propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model in the presence of label noise. It works by tracking the changes in the model's predictions on the training set during the training process, aiming to halt training before the model unduly fits mislabeled data. This method is empirically supported by our observation that minimum fluctuations in predictions typically occur at the training epoch before the model excessively fits mislabeled data. Through extensive experiments, we show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.

Paper Structure

This paper contains 24 sections, 5 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: We examine how the model's fitting and generalization performance evolves during the training process of learning with noisy labels. Utilizing the k-epoch learning metric yuan2023latestopping, we measure the number of training examples that can be consistently classified according to their provided labels. This allows us to capture fluctuations in the model's fitting performance. We categorize the training process into three stages according to the stability of the fitting performance (panel d). This categorization is informed by an integrated analysis of generalization performance derived from test error (panel a) and fitting performance derived from training error (panel b) and 8-epoch learning metrics (panel c). Thus, we design the prediction changes metric to measure the shifts in the model's predictions on the training set to pinpoint the early stopping point. For detailed information regarding the experiment settings, please refer to Appendix \ref{['appendixa']} and Section \ref{['sec3']}.
  • Figure 2: Tracking test error and training error (mislabeled examples) in training process.
  • Figure 3: Using stability and variability metrics to track fluctuations in predictions.
  • Figure 4: We aim to compare the test accuracy of models selected at the early stopping point by the Label Wave method and those selected by the hold-out validation method. (a) We selected a subset of the training data, with set sizes ranging from 250 to 16,000. This subset was used for both to compute the prediction change in the Label Wave method and serving as the hold-out set in hold-out validation, respectively. We computed the Kendall $\tau$ correlation and the test accuracy of the models selected by these two methods. (b) Further analysis was conducted to evaluate the difference in test accuracy between models selected by our proposed Label Wave method and those selected by the hold-out validation method with noise rates ranging from 20% to 60%.
  • Figure 5: Based on the multi-metrics we are tracking for the model's generalization and fitting performance between Point 1 and Point 2 (as shown in panel a), we propose a new transitional stage of learning with noisy labels, termed "learning confusing patterns" (shown in panel b).