Large-Margin Softmax Loss for Convolutional Neural Networks

Weiyang Liu; Yandong Wen; Zhiding Yu; Meng Yang

Large-Margin Softmax Loss for Convolutional Neural Networks

Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang

TL;DR

The paper presents Large-Margin Softmax (L-Softmax), a margin-based extension of the softmax loss that enforces larger angular separation between class decision boundaries. By replacing the ground-truth cosine term with a margin-controlled function psi(theta) parameterized by an integer m, L-Softmax yields increased intra-class compactness and inter-class separability. The authors provide a geometric interpretation, an SGD-friendly optimization framework, and extensive experiments on MNIST, CIFAR-10/100, and LFW demonstrating improved discriminative features and verification performance. As a drop-in replacement for softmax, L-Softmax offers adjustable angular margins to enhance CNN-based recognition and verification tasks without substantial overhead.

Abstract

Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.

Large-Margin Softmax Loss for Convolutional Neural Networks

TL;DR

Abstract

Large-Margin Softmax Loss for Convolutional Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)