Table of Contents
Fetching ...

Meta-Semi: A Meta-learning Approach for Semi-supervised Learning

Yulin Wang, Jiayi Guo, Shiji Song, Gao Huang

TL;DR

A novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL.

Abstract

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.

Meta-Semi: A Meta-learning Approach for Semi-supervised Learning

TL;DR

A novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL.

Abstract

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.

Paper Structure

This paper contains 17 sections, 3 theorems, 58 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that $\overline{\bm{\theta}}^t_M$ is given by $M$ steps of gradient descents starting from $\overline{\bm{\theta}}^t_0=\bm{\theta}^{t}$. Then we have

Figures (3)

  • Figure 1: The Meta-Semi Algorithm.
  • Figure 2: The empirical validation of Assumption \ref{['assum:bound']}. The value of $\frac{\mathbb{E}_{\tilde{\mathcal{X}}, \tilde{\mathcal{U}}} \lVert \nabla_{\!\!\bm{\theta}^t}\!\mathcal{L}_{meta} \lVert^2}{\lVert \nabla_{\!\!\bm{\theta}^t}\!\mathbb{E}_{\tilde{\mathcal{X}}} G(\tilde{\mathcal{X}}, \bm{\theta}^t) \lVert^2}$ is estimated at each training epoch using Monte-Carlo sampling with a sample size $500$. Results on CIFAR-10 (C10) and CIFAR-100 (C100) with varying numbers of labeled samples are presented. It can be observed that the ratio generally increases before the $500^{\text{th}}$ epoch, but gradually becomes stable or even decreases in the last part of the training process when the learning rate approaches 0. Therefore, it is empirically reasonable to assume that Assumption \ref{['assum:bound']} holds.
  • Figure 3: Test errors on SVHN with varying amount of labeled data. We report the average results and the standard deviations of 5 independent experiments. All results are based on CNN-13. The best results are bold-faced.

Theorems & Definitions (8)

  • Proposition 1
  • proof
  • Definition 1
  • Proposition 2
  • proof
  • proof
  • Lemma 1
  • proof