Smooth Pseudo-Labeling

Nikolaos Karaliolios; Hervé Le Borgne; Florian Chabot

Smooth Pseudo-Labeling

Nikolaos Karaliolios, Hervé Le Borgne, Florian Chabot

TL;DR

A Smooth Pseudo-Labeling (SP L) loss function is introduced, which consists in adding a multiplicative factor in the loss function that smooths out the discontinuities in the derivative due to thresholding, and it significantly improves the performance in the regime of scarce labels, without addition of any modules, hyperparameters, or computational overhead.

Abstract

Semi-Supervised Learning (SSL) seeks to leverage large amounts of non-annotated data along with the smallest amount possible of annotated data in order to achieve the same level of performance as if all data were annotated. A fruitful method in SSL is Pseudo-Labeling (PL), which, however, suffers from the important drawback that the associated loss function has discontinuities in its derivatives, which cause instabilities in performance when labels are very scarce. In the present work, we address this drawback with the introduction of a Smooth Pseudo-Labeling (SP L) loss function. It consists in adding a multiplicative factor in the loss function that smooths out the discontinuities in the derivative due to thresholding. In our experiments, we test our improvements on FixMatch and show that it significantly improves the performance in the regime of scarce labels, without addition of any modules, hyperparameters, or computational overhead. In the more stable regime of abundant labels, performance remains at the same level. Robustness with respect to variation of hyperparameters and training parameters is also significantly improved. Moreover, we introduce a new benchmark, where labeled images are selected randomly from the whole dataset, without imposing representation of each class proportional to its frequency in the dataset. We see that the smooth version of FixMatch does appear to perform better than the original, non-smooth implementation. However, more importantly, we notice that both implementations do not necessarily see their performance improve when labeled images are added, an important issue in the design of SSL algorithms that should be addressed so that Active Learning algorithms become more reliable and explainable.

Smooth Pseudo-Labeling

TL;DR

Abstract

Paper Structure (42 sections, 27 equations, 9 figures, 15 tables)

This paper contains 42 sections, 27 equations, 9 figures, 15 tables.

Introduction
Related work
Method
Notation
Pseudo-Labeling
A remark on the dependence of $L_{PL}$ on $\theta$
Smooth Pseudo-Labeling
The shape of the factor
Smooth FixMatch
Learning the smoothness factor
Experiments
CIFAR-$10$-$40$
Strong supervision regime
Ablation studies
Shape of continuity factor
...and 27 more sections

Figures (9)

Figure 1: Loss functions minimized by PL (left) and SPL (right) in blue, and their derivatives in red, for $\tau = 0.75$.
Figure 2: Error rate on the $1$st fold of CIFAR-$10$-$40$ when the threshold varies. The smooth versions of FixMatch with all three shape factors (linear, quadratic, square root, reliably outperform the baseline.
Figure 3: Error rate on the $4$th fold of CIFAR-$10$-$40$ when SDG momentum varies. The window allowing a good performance for FixMatch is very narrow, but becomes significantly wider upon introduction of the smooth loss function.
Figure 4: Standard deviation of class frequency from uniform class distribution of CIFAR-$10$ with random sampling normalized by the uniform frequency $1/10$, over $6$ folds. The graph shows that with truly random sampling the class distribution of the labeled dataset remains quite far from the true (uniform) class distribution of the whole dataset. We also include the evolution of the standard deviation of $3$ folds that proved difficult for the model to converge. In particular, the difficulty of fold $5$ shows that standard deviation is a crude measure, and that representativeness of images does play a role in convergence.
Figure 5: Evolution of the error rate on the $5$th fold of CIFAR-$10$ with random sampling. Both models can regress to higher error rates when the number of labeled images is increased, but this occurs for greater number of labels for FixMatch than for Smooth FixMatch.
...and 4 more figures

Smooth Pseudo-Labeling

TL;DR

Abstract

Smooth Pseudo-Labeling

Authors

TL;DR

Abstract

Table of Contents

Figures (9)