Table of Contents
Fetching ...

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning

Eric Arazo, Diego Ortego, Paul Albert, Noel E. O'Connor, Kevin McGuinness

TL;DR

<3-5 sentence high-level summary> The paper addresses semi-supervised image classification by exploiting pseudo-labels generated from network predictions, focusing on soft labels to avoid hard labeling pitfalls. It identifies confirmation bias as a key challenge and tackles it with mixup data augmentation and a minimum number of labeled samples per mini-batch, along with dropout and data augmentation. Across CIFAR-10/100, SVHN, and Mini-ImageNet, the approach achieves state-of-the-art results with a simpler pipeline than many consistency-regularization methods. The findings suggest that carefully regularized pseudo-labeling can outperform more complex SSL strategies and motivate further study of pseudo-labeling in diverse or large-scale settings.

Abstract

Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100, SVHN, and Mini-ImageNet despite being much simpler than other methods. These results demonstrate that pseudo-labeling alone can outperform consistency regularization methods, while the opposite was supposed in previous work. Source code is available at https://git.io/fjQsC.

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning

TL;DR

<3-5 sentence high-level summary> The paper addresses semi-supervised image classification by exploiting pseudo-labels generated from network predictions, focusing on soft labels to avoid hard labeling pitfalls. It identifies confirmation bias as a key challenge and tackles it with mixup data augmentation and a minimum number of labeled samples per mini-batch, along with dropout and data augmentation. Across CIFAR-10/100, SVHN, and Mini-ImageNet, the approach achieves state-of-the-art results with a simpler pipeline than many consistency-regularization methods. The findings suggest that carefully regularized pseudo-labeling can outperform more complex SSL strategies and motivate further study of pseudo-labeling in diverse or large-scale settings.

Abstract

Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100, SVHN, and Mini-ImageNet despite being much simpler than other methods. These results demonstrate that pseudo-labeling alone can outperform consistency regularization methods, while the opposite was supposed in previous work. Source code is available at https://git.io/fjQsC.

Paper Structure

This paper contains 16 sections, 8 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Pseudo-labeling in the "two moons" data (4 labels/class) for 1000 samples. From left to right: no mixup, mixup, and mixup with a minimum number of labeled samples per mini-batch. We use an NN classifier with one hidden layer with 50 hidden units as in 2018_TPAMI_VAT. Best viewed in color.
  • Figure 2: Example of certainty of incorrect predictions $r_{t}$ during training when using 500 (left) and 4000 (right) labeled images in CIFAR-10. Moving from cross-entropy (C) to mixup (M) reduces $r_{t}$, whereas adding a minimum number of samples per mini-batch (*) also helps in 500 labels, where M* (with slightly lower $r_{t}$ than M) is the only configuration that converges, as shown in Table \ref{['tab:Mixup-and-mini-batch']} (top). Best viewed in color.