Table of Contents
Fetching ...

An Overview of Deep Semi-Supervised Learning

Yassine Ouali, Céline Hudelot, Myriam Tami

TL;DR

This paper surveys the landscape of deep semi-supervised learning, addressing the challenge of obtaining large labeled datasets by leveraging unlabeled data. It categorizes dominant Deep SSL approaches into consistency regularization, proxy-label methods, generative models, graph-based SSL, and self-supervision, detailing representative techniques such as MixMatch, FixMatch, VAT, Ladder Networks, and various GAN/VAEs methods. The work highlights key assumptions (smoothness, cluster, manifold) and practical evaluation guidelines, including fair baselines and dataset considerations. By connecting theoretical premises with a wide array of algorithms, the paper provides a consolidated reference for researchers and practitioners aiming to deploy data-efficient deep learning systems across vision and beyond.

Abstract

Deep neural networks demonstrated their ability to provide remarkable performances on a wide range of supervised learning tasks (e.g., image classification) when trained on extensive collections of labeled data (e.g., ImageNet). However, creating such large datasets requires a considerable amount of resources, time, and effort. Such resources may not be available in many practical cases, limiting the adoption and the application of many deep learning methods. In a search for more data-efficient deep learning methods to overcome the need for large annotated datasets, there is a rising research interest in semi-supervised learning and its applications to deep neural networks to reduce the amount of labeled data required, by either developing novel methods or adopting existing semi-supervised learning frameworks for a deep learning setting. In this paper, we provide a comprehensive overview of deep semi-supervised learning, starting with an introduction to the field, followed by a summarization of the dominant semi-supervised approaches in deep learning.

An Overview of Deep Semi-Supervised Learning

TL;DR

This paper surveys the landscape of deep semi-supervised learning, addressing the challenge of obtaining large labeled datasets by leveraging unlabeled data. It categorizes dominant Deep SSL approaches into consistency regularization, proxy-label methods, generative models, graph-based SSL, and self-supervision, detailing representative techniques such as MixMatch, FixMatch, VAT, Ladder Networks, and various GAN/VAEs methods. The work highlights key assumptions (smoothness, cluster, manifold) and practical evaluation guidelines, including fair baselines and dataset considerations. By connecting theoretical premises with a wide array of algorithms, the paper provides a consolidated reference for researchers and practitioners aiming to deploy data-efficient deep learning systems across vision and beyond.

Abstract

Deep neural networks demonstrated their ability to provide remarkable performances on a wide range of supervised learning tasks (e.g., image classification) when trained on extensive collections of labeled data (e.g., ImageNet). However, creating such large datasets requires a considerable amount of resources, time, and effort. Such resources may not be available in many practical cases, limiting the adoption and the application of many deep learning methods. In a search for more data-efficient deep learning methods to overcome the need for large annotated datasets, there is a rising research interest in semi-supervised learning and its applications to deep neural networks to reduce the amount of labeled data required, by either developing novel methods or adopting existing semi-supervised learning frameworks for a deep learning setting. In this paper, we provide a comprehensive overview of deep semi-supervised learning, starting with an introduction to the field, followed by a summarization of the dominant semi-supervised approaches in deep learning.

Paper Structure

This paper contains 60 sections, 44 equations, 18 figures.

Figures (18)

  • Figure 1: SSL toy example. The decision boundaries obtained on two-moons dataset, with a supervised and different SSL approaches using 6 labeled examples, 3 for each class, and the rest of the points as unlabeled data.
  • Figure 2: Ladder Networks. An illustration of one forward pass of Ladder Networks. The objective is to reconstruct the clean activations of the encoder using a denoising decoder that takes as input the corrupted activations of the noisy encoder.
  • Figure 3: Loss computation for $\Pi$-Model. The MSE between the two outputs is computed for the unsupervised loss, and if the input is a labeled example, we add the supervised loss to the weighted unsupervised loss.
  • Figure 4: Loss computation for Temporal Ensembling. The MSE between the current prediction and the aggregated target is computed for the unsupervised loss, and if the input is a labeled example, we add the supervised loss to the weighted unsupervised loss.
  • Figure 5: Mean Teacher. The teacher model, which is an EMA of the student model, is responsible for generating the targets for consistency training. The student model is then trained to minimize the supervised loss over labeled examples and the consistency loss over unlabeled examples. At each training iteration, both models are evaluated with an injected noise ($\eta$, $\eta^\prime$), and the weights of the teacher model are updated using the current student model to incorporate the learned information at a faster pace.
  • ...and 13 more figures