Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Rudrajit Das; Inderjit S. Dhillon; Alessandro Epasto; Adel Javanmard; Jieming Mao; Vahab Mirrokni; Sujay Sanghavi; Peilin Zhong

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

TL;DR

This work analyzes retraining a classifier with its own predicted hard labels in the presence of noisy labels, proving a first theoretical result that full retraining can provably increase population accuracy in a linearly separable binary setting when the label-noise rate is not too small and the dataset size satisfies a dimension- and noise-dependent regime. It introduces consensus-based retraining, which trains only on samples where the model's prediction agrees with the noisy label, as a simple, privacy-friendly enhancement for label differential privacy (label-DP) training. The authors provide rigorous population-error bounds for the initial training and for retraining, showing when retraining outperforms the baseline, and demonstrate substantial empirical gains across CIFAR-10/100, DomainNet, and AG News Subset under various DP budgets. The results suggest that consensus-based RT can meaningfully boost DP-trained models without additional privacy costs, with potential for broader applicability beyond DP settings, and point to future work on extending theory to consensus-based retraining and non-uniform noise models.

Abstract

The performance of a model trained with noisy labels is often improved by simply \textit{retraining} the model with its \textit{own predicted hard labels} (i.e., 1/0 labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at no extra privacy cost; we call this consensus-based retraining. As an example, when training ResNet-18 on CIFAR-100 with $ε=3$ label DP, we obtain more than 6% improvement in accuracy with consensus-based retraining.

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

TL;DR

Abstract

label DP, we obtain more than 6% improvement in accuracy with consensus-based retraining.

Paper Structure (25 sections, 12 theorems, 111 equations, 1 figure, 15 tables)

This paper contains 25 sections, 12 theorems, 111 equations, 1 figure, 15 tables.

Introduction
Related Work
Preliminaries
Full Retraining in the Presence of Label Noise: Theoretical Analysis
Initial Training
Retraining
Improving Label DP Training with Retraining (RT)
Conclusion
Problem Setting of Figure \ref{['fig-int']}
Proof of Theorem \ref{['thm1-apr10']}
Lower bound TEXT
Upper bound TEXT
Proof of Theorem \ref{['thm:acc_vanilla']}
Proof of Theorem \ref{['thm:lower-bound']}
Comparison with Other Lower Bounds for Learning with Noisy Labels
...and 10 more sections

Key Result

Theorem 4.1

Consider $\bm{x}\notin \mathcal{T}$ and let $y$ be its true label. We have

Figures (1)

Figure 1: Retraining Intuition. Samples to the right (respectively, left) of the separator (black vertical line in the middle) and colored blue (respectively, red) have actual label $+1$ (respectively, $-1$). For both classes, the incorrectly labeled samples are marked by crosses ($\times$), whereas the correctly labeled samples are marked by dots ($\circ$) of the appropriate color. The amount of label noise and the number of training samples are the same in \ref{['fig-gamma1']} and \ref{['fig-gamma2']}. The top and bottom plots show the joint scatter plot of the training samples with the (noisy) labels given to us and the labels predicted by the model after training with the given labels, respectively. Notice that in \ref{['fig-gamma1']}, the model correctly predicts the labels of several samples that were given to it with the wrong label -- especially, those that are far away from the separator. This is not quite the case in \ref{['fig-gamma2']}. This difference gets reflected in the performance on the test set after retraining. Specifically, in \ref{['fig-gamma1']}, retraining increases the test accuracy to $97.67\%$ from $89\%$. However, retraining yields no improvement in \ref{['fig-gamma2']}. So the success of retraining depends on the inter-class separation; in particular, retraining is beneficial when the classes are well-separated.

Theorems & Definitions (18)

Definition 3.1: Label differential privacy (DP)
Theorem 4.1: Initial training
Theorem 4.2: Initial training's population error
Remark 4.3: Tightness of error bounds
Corollary 4.4: Initial training's sample complexity
Remark 4.5: Effect of degree of separation
Theorem 4.6: Information-theoretic lower bound on sample complexity
Remark 4.7: Minimax optimality of sample complexity
Theorem 4.8: Retraining
Theorem 4.9: Retraining's population error
...and 8 more

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

TL;DR

Abstract

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (18)