Table of Contents
Fetching ...

Pre-train to Gain: Robust Learning Without Clean Labels

David Szczecina, Nicholas Pellegrino, Paul Fieguth

TL;DR

This paper addresses the challenge of training deep networks with noisy labels without relying on a clean data subset. It proposes a two-stage approach where a feature extractor is pre-trained with self-supervised learning on the noisy target dataset (using SimCLR or Barlow Twins) and then fine-tuned with supervised learning on the same noisy data. Across CIFAR-10/100 with synthetic and real-world noise, SSL pre-training improves both conventional accuracy and label-error detection (F1 and Balanced Accuracy), with larger benefits as noise increases; it also remains competitive with ImageNet pre-training at low noise and outperforms it at high noise. The findings suggest that domain-aligned SSL pre-training yields robust representations that mitigate memorization of corrupted labels and persist under extended supervised training, offering a simple, scalable method for robust learning in noisy-label settings.

Abstract

Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely on the availability of a clean subset of data. By pre-training a feature extractor backbone without labels using self-supervised learning (SSL), followed by standard supervised training on the noisy dataset, we can train a more noise robust model without requiring a subset with clean labels. We evaluate the use of SimCLR and Barlow~Twins as SSL methods on CIFAR-10 and CIFAR-100 under synthetic and real world noise. Across all noise rates, self-supervised pre-training consistently improves classification accuracy and enhances downstream label-error detection (F1 and Balanced Accuracy). The performance gap widens as the noise rate increases, demonstrating improved robustness. Notably, our approach achieves comparable results to ImageNet pre-trained models at low noise levels, while substantially outperforming them under high noise conditions.

Pre-train to Gain: Robust Learning Without Clean Labels

TL;DR

This paper addresses the challenge of training deep networks with noisy labels without relying on a clean data subset. It proposes a two-stage approach where a feature extractor is pre-trained with self-supervised learning on the noisy target dataset (using SimCLR or Barlow Twins) and then fine-tuned with supervised learning on the same noisy data. Across CIFAR-10/100 with synthetic and real-world noise, SSL pre-training improves both conventional accuracy and label-error detection (F1 and Balanced Accuracy), with larger benefits as noise increases; it also remains competitive with ImageNet pre-training at low noise and outperforms it at high noise. The findings suggest that domain-aligned SSL pre-training yields robust representations that mitigate memorization of corrupted labels and persist under extended supervised training, offering a simple, scalable method for robust learning in noisy-label settings.

Abstract

Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely on the availability of a clean subset of data. By pre-training a feature extractor backbone without labels using self-supervised learning (SSL), followed by standard supervised training on the noisy dataset, we can train a more noise robust model without requiring a subset with clean labels. We evaluate the use of SimCLR and Barlow~Twins as SSL methods on CIFAR-10 and CIFAR-100 under synthetic and real world noise. Across all noise rates, self-supervised pre-training consistently improves classification accuracy and enhances downstream label-error detection (F1 and Balanced Accuracy). The performance gap widens as the noise rate increases, demonstrating improved robustness. Notably, our approach achieves comparable results to ImageNet pre-trained models at low noise levels, while substantially outperforming them under high noise conditions.

Paper Structure

This paper contains 19 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Classification accuracy across varying label noise rates, for increasing durations of self-supervised pre-training on CIFAR-10 using SimCLR. Mean over 5 seeds with standard error is graphed. As the number of SSL pre-training epochs increases, downstream accuracy consistently improves across all corruption levels.
  • Figure 2: Effect of SimCLR SSL pre-training duration on label error detection (F1-score) on CIFAR-10. Mean over 5 seeds is graphed. Performance improves rapidly with additional pre-training, with gains plateauing after approximately 50 epochs.
  • Figure 3: Comparison of Baseline and SSL-pretrained models over 100 supervised epochs on Cifar-100 with $\eta=0.6$. SSL pre-training (using SimCLR for 25 epochs) yields higher accuracy, slower overfitting, and reduced loss escalation under label noise.