Deep Learning using Linear Support Vector Machines

Yichuan Tang

Deep Learning using Linear Support Vector Machines

Yichuan Tang

TL;DR

The paper questions the default use of softmax cross-entropy in deep classifiers and proposes a linear L2-SVM top layer that can be trained end-to-end via backpropagation. Across MNIST, CIFAR-10, and a facial expression recognition task, the DLSVM top layer yields consistent accuracy gains, attributed to the margin-based regularization of the SVM loss rather than optimization tweaks alone. The results show notable improvements (e.g., MNIST 0.87% vs 0.99% error; CIFAR-10 11.9% vs 14.0% error) and competitive facial expression recognition scores, suggesting that switching to an SVM top layer is a simple, effective alternative for discriminative deep models. The authors also analyze the regularization vs optimization dynamics, indicating that the margin-based objective plays a key role in the observed gains.

Abstract

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

Deep Learning using Linear Support Vector Machines

TL;DR

Abstract

Deep Learning using Linear Support Vector Machines

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)