Table of Contents
Fetching ...

Semi-Supervised Learning with Ladder Networks

Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, Tapani Raiko

TL;DR

The paper tackles label scarcity by unifying supervised learning with layer-wise unsupervised denoising in Ladder networks. It extends the Ladder architecture to include supervision, enabling end-to-end training with a sum of supervised and denoising costs and leveraging skip connections between encoder and decoder. Empirically, the approach delivers state-of-the-art semi-supervised performance on MNIST and CIFAR-10, and shows strong results even with very few labeled examples, while remaining compatible with both MLPs and CNNs. The work offers a simple, scalable framework for semi-supervised learning that can be integrated into existing feedforward architectures and extended to larger temporal problems.

Abstract

We combine supervised learning with unsupervised learning in deep neural networks. The proposed model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by backpropagation, avoiding the need for layer-wise pre-training. Our work builds on the Ladder network proposed by Valpola (2015), which we extend by combining the model with supervision. We show that the resulting model reaches state-of-the-art performance in semi-supervised MNIST and CIFAR-10 classification, in addition to permutation-invariant MNIST classification with all labels.

Semi-Supervised Learning with Ladder Networks

TL;DR

The paper tackles label scarcity by unifying supervised learning with layer-wise unsupervised denoising in Ladder networks. It extends the Ladder architecture to include supervision, enabling end-to-end training with a sum of supervised and denoising costs and leveraging skip connections between encoder and decoder. Empirically, the approach delivers state-of-the-art semi-supervised performance on MNIST and CIFAR-10, and shows strong results even with very few labeled examples, while remaining compatible with both MLPs and CNNs. The work offers a simple, scalable framework for semi-supervised learning that can be integrated into existing feedforward architectures and extended to larger temporal problems.

Abstract

We combine supervised learning with unsupervised learning in deep neural networks. The proposed model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by backpropagation, avoiding the need for layer-wise pre-training. Our work builds on the Ladder network proposed by Valpola (2015), which we extend by combining the model with supervision. We show that the resulting model reaches state-of-the-art performance in semi-supervised MNIST and CIFAR-10 classification, in addition to permutation-invariant MNIST classification with all labels.

Paper Structure

This paper contains 21 sections, 18 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: A depiction of an optimal denoising function for a bimodal distribution. The input for the function is the corrupted value (x axis) and the target is the clean value (y axis). The denoising function moves values towards higher probabilities as show by the green arrows.
  • Figure 2: A conceptual illustration of the Ladder network when $L=2$. The feedforward path ($\mathbf{x} \to \mathbf{z}^{(1)} \to \mathbf{z}^{(2)} \to \mathbf{y}$) shares the mappings $f^{(l)}$ with the corrupted feedforward path, or encoder ($\mathbf{x} \to \tilde{\mathbf{z}}^{(1)} \to \tilde{\mathbf{z}}^{(2)} \to \tilde{\mathbf{y}}$). The decoder ($\tilde{\mathbf{z}}^{(l)} \to \hat{\mathbf{z}}^{(l)} \to \hat{\mathbf{x}}$) consists of the denoising functions $g^{(l)}$ and has cost functions $C^{(l)}_d$ on each layer trying to minimize the difference between $\hat{\mathbf{z}}^{(l)}$ and $\mathbf{z}^{(l)}$. The output $\tilde{\mathbf{y}}$ of the encoder can also be trained to match available labels $t(n)$.