Table of Contents
Fetching ...

Instance-dependent Early Stopping

Suqin Yuan, Runqi Lin, Lei Feng, Bo Han, Tongliang Liu

TL;DR

The paper addresses inefficiency in traditional early stopping by introducing Instance-dependent Early Stopping (IES), which stops training on a per-instance basis once an instance is mastered. Mastery is detected via the second-order loss difference $|\Delta^2 L_i(w^{(t)})|$ (with a global threshold $\delta$), enabling dynamic pruning of mastered samples from backpropagation. Across CIFAR-10/100 and ImageNet-1k, IES achieves 10%-50% reduction in backpropagation while maintaining or slightly improving test accuracy and enabling better transfer learning, with reported wall-time speedups of roughly 1.3×–1.4× and notable improvements in gradient norms and loss landscape sharpness. The approach is robust to threshold variations, outperforms several data-efficiency baselines, and extends to high-level vision tasks, though theoretical guarantees and fairness implications require further study.

Abstract

In machine learning practice, early stopping has been widely used to regularize models and can save computational costs by halting the training process when the model's performance on a validation set stops improving. However, conventional early stopping applies the same stopping criterion to all instances without considering their individual learning statuses, which leads to redundant computations on instances that are already well-learned. To further improve the efficiency, we propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level, based on the core principle that once the model has mastered an instance, the training on it should stop. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. This offers a more consistent measure of an instance's learning status compared with directly using the loss value, and thus allows for a unified threshold to determine when an instance can be excluded from further backpropagation. We show that excluding mastered instances from backpropagation can increase the gradient norms, thereby accelerating the decrease of the training loss and speeding up the training process. Extensive experiments on benchmarks demonstrate that IES method can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.

Instance-dependent Early Stopping

TL;DR

The paper addresses inefficiency in traditional early stopping by introducing Instance-dependent Early Stopping (IES), which stops training on a per-instance basis once an instance is mastered. Mastery is detected via the second-order loss difference (with a global threshold ), enabling dynamic pruning of mastered samples from backpropagation. Across CIFAR-10/100 and ImageNet-1k, IES achieves 10%-50% reduction in backpropagation while maintaining or slightly improving test accuracy and enabling better transfer learning, with reported wall-time speedups of roughly 1.3×–1.4× and notable improvements in gradient norms and loss landscape sharpness. The approach is robust to threshold variations, outperforms several data-efficiency baselines, and extends to high-level vision tasks, though theoretical guarantees and fairness implications require further study.

Abstract

In machine learning practice, early stopping has been widely used to regularize models and can save computational costs by halting the training process when the model's performance on a validation set stops improving. However, conventional early stopping applies the same stopping criterion to all instances without considering their individual learning statuses, which leads to redundant computations on instances that are already well-learned. To further improve the efficiency, we propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level, based on the core principle that once the model has mastered an instance, the training on it should stop. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. This offers a more consistent measure of an instance's learning status compared with directly using the loss value, and thus allows for a unified threshold to determine when an instance can be excluded from further backpropagation. We show that excluding mastered instances from backpropagation can increase the gradient norms, thereby accelerating the decrease of the training loss and speeding up the training process. Extensive experiments on benchmarks demonstrate that IES method can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.

Paper Structure

This paper contains 34 sections, 4 equations, 8 figures, 14 tables, 1 algorithm.

Figures (8)

  • Figure 1: Effectiveness of Instance-dependent Early Stopping (IES) on ImageNet-1k and CIFAR-10 datasets. Top row: Test accuracy over the course of training, showing that IES (Ours) achieves comparable accuracy to the baseline (No Removal) despite training on fewer samples. Bottom row: Number of training samples excluded from backpropagation by IES over the course of training. As the model masters more and more samples during the training process, IES allows an increasing number of these mastered samples to be excluded from backpropagation, significantly reducing computation while still maintaining the same performance as the baseline method.
  • Figure 2: The curves show the number of instances that meet the corresponding mastered criteria (N = {0, 1, 2}, $\delta = 1e^{-4}$) as the training epochs progress, under two scenarios: excluding the mastered instances from backpropagation and allowing the mastered instances to participate in backpropagation. The proximity of the curves suggests that the model can maintain its "mastered" on the mastered instances without the need for actively repeated training on them.
  • Figure 3: Coefficient of variation (CV) of different orders of loss differences during training.
  • Figure 4: Comparison of model performance metrics between the IES method and the baseline method over the same number of backpropagation training instances. The metrics include test error, gradient norm, training loss, sharpness-aware minimization (SAM) value, and the maximum eigenvalue of the Hessian matrix. IES consistently outperforms the baseline in test error and reduces training loss, SAM value, and the maximum eigenvalue more effectively, indicating a faster progression in model training. We use ResNet-18 on the CIFAR-10 dataset in this experiment. Further detailed experimental settings can be found in Appendix \ref{['appb']} and Section \ref{['sec4']}.
  • Figure 5: Comparison of the proposed IES method of different IES criteria (loss, 1st, 2nd, and 3rd order differences) with other sample selection methods under different Total Excluded Samples values on both CIFAR datasets. The lower subfigure illustrates the effect of varying $\delta$ values used in IES methods on training time reduction, sample removal, and model performance (3 runs, mean±std).
  • ...and 3 more figures