Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks
Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox
TL;DR
The paper introduces Exemplar-CNN, a discriminative unsupervised feature learning method that creates surrogate classes by applying random transformations to seed image patches. By training a CNN to discriminate among these surrogate classes, the method learns generic, transformation-invariant features that transfer well to object classification and descriptor matching tasks, achieving state-of-the-art results among unsupervised methods on several datasets. The work provides formal analysis of the learning objective, extensive ablations on surrogate class count, sample size, transformations, and network architecture, and demonstrates that augmenting transformations (including blur) can enhance descriptor matching. Overall, the approach emphasizes the power of discriminative objectives and diverse data augmentation for unsupervised representation learning with practical impact for both recognition and matching tasks.
Abstract
Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
