Table of Contents
Fetching ...

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox

TL;DR

The paper introduces Exemplar-CNN, a discriminative unsupervised feature learning method that creates surrogate classes by applying random transformations to seed image patches. By training a CNN to discriminate among these surrogate classes, the method learns generic, transformation-invariant features that transfer well to object classification and descriptor matching tasks, achieving state-of-the-art results among unsupervised methods on several datasets. The work provides formal analysis of the learning objective, extensive ablations on surrogate class count, sample size, transformations, and network architecture, and demonstrates that augmenting transformations (including blur) can enhance descriptor matching. Overall, the approach emphasizes the power of discriminative objectives and diverse data augmentation for unsupervised representation learning with practical impact for both recognition and matching tasks.

Abstract

Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

TL;DR

The paper introduces Exemplar-CNN, a discriminative unsupervised feature learning method that creates surrogate classes by applying random transformations to seed image patches. By training a CNN to discriminate among these surrogate classes, the method learns generic, transformation-invariant features that transfer well to object classification and descriptor matching tasks, achieving state-of-the-art results among unsupervised methods on several datasets. The work provides formal analysis of the learning objective, extensive ablations on surrogate class count, sample size, transformations, and network architecture, and demonstrates that augmenting transformations (including blur) can enhance descriptor matching. Overall, the approach emphasizes the power of discriminative objectives and diverse data augmentation for unsupervised representation learning with practical impact for both recognition and matching tasks.

Abstract

Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

Paper Structure

This paper contains 29 sections, 2 theorems, 9 equations, 11 figures, 5 tables.

Key Result

Proposition 1

The function is convex. Moreover, for any $\mathbf{x} \in \mathbb{R}^n$ the kernel of its Hessian matrix $\nabla^2 Z (\mathbf{x})$ is given by $span\, (\mathbf{1})$

Figures (11)

  • Figure 1: Exemplary patches sampled from the STL unlabeled dataset which are later augmented by various transformations to obtain surrogate data for the CNN training.
  • Figure 2: Several random transformations applied to one of the patches extracted from the STL unlabeled dataset. The original ('seed') patch is in the top left corner.
  • Figure 3: Influence of the number of surrogate training classes. The validation error on the surrogate data is shown in red. Note the different y-axes for the two curves.
  • Figure 4: Classification performance on STL for different numbers of samples per class. Random filters can be seen as '0 samples per class'.
  • Figure 5: Influence of removing groups of transformations during generation of the surrogate training data. Baseline ('$0$' value) is applying all transformations. Each group of three bars corresponds to removing some of the transformations.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2