Table of Contents
Fetching ...

Large-Margin Softmax Loss for Convolutional Neural Networks

Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang

TL;DR

The paper presents Large-Margin Softmax (L-Softmax), a margin-based extension of the softmax loss that enforces larger angular separation between class decision boundaries. By replacing the ground-truth cosine term with a margin-controlled function psi(theta) parameterized by an integer m, L-Softmax yields increased intra-class compactness and inter-class separability. The authors provide a geometric interpretation, an SGD-friendly optimization framework, and extensive experiments on MNIST, CIFAR-10/100, and LFW demonstrating improved discriminative features and verification performance. As a drop-in replacement for softmax, L-Softmax offers adjustable angular margins to enhance CNN-based recognition and verification tasks without substantial overhead.

Abstract

Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.

Large-Margin Softmax Loss for Convolutional Neural Networks

TL;DR

The paper presents Large-Margin Softmax (L-Softmax), a margin-based extension of the softmax loss that enforces larger angular separation between class decision boundaries. By replacing the ground-truth cosine term with a margin-controlled function psi(theta) parameterized by an integer m, L-Softmax yields increased intra-class compactness and inter-class separability. The authors provide a geometric interpretation, an SGD-friendly optimization framework, and extensive experiments on MNIST, CIFAR-10/100, and LFW demonstrating improved discriminative features and verification performance. As a drop-in replacement for softmax, L-Softmax offers adjustable angular margins to enhance CNN-based recognition and verification tasks without substantial overhead.

Abstract

Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.

Paper Structure

This paper contains 13 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Standard CNNs can be viewed as convolutional feature learning machines that are supervised by the softmax loss.
  • Figure 2: CNN-leanrned features visualization (Softmax Loss (m=1) vs. L-Softmax loss (m=2,3,4)) in MNIST dataset. Specifically, we set the feature (input of the L-Softmax loss) dimension as 2, and then plot them by class. We omit the constant term in the fully connected layer, since it just complicates our analysis and nearly does not affect the performance. Note that, the reason why the testing accuracy is not as good as in Table. \ref{['mnist']} is that we only use 2D features to classify the digits here.
  • Figure 3: $\psi(\theta)$ for softmax loss and L-Softmax loss.
  • Figure 4: Examples of Geometric Interpretation.
  • Figure 5: Confusion matrix on CIFAR10, CIFAR10+ and CIFAR100.
  • ...and 2 more figures