L2-constrained Softmax Loss for Discriminative Face Verification
Rajeev Ranjan, Carlos D. Castillo, Rama Chellappa
TL;DR
Facing unconstrained face verification, the paper identifies that softmax loss does not maximize the margin between positive and negative pairs and that feature norms correlate with image quality. It introduces L2-softmax, which constrains feature norms to a fixed radius $α$ using an $L_{2}$-normalize layer and a scale layer, preserving end-to-end training. The approach yields consistent improvements across LFW, YouTube-Style Face (YTF), and IJB-A, setting state-of-the-art on IJB-A with TAR $0.909$ at FAR $0.0001$, and achieving $99.78\%$ on LFW and $96.08\%$ on YTF with RX101. These gains are complementary to metric-learning methods and auxiliary losses, and the method remains easy to integrate with different DCNN architectures, suggesting strong practical impact for discriminative face verification.
Abstract
In recent years, the performance of face verification systems has significantly improved using deep convolutional neural networks (DCNNs). A typical pipeline for face verification includes training a deep network for subject classification with softmax loss, using the penultimate layer output as the feature descriptor, and generating a cosine similarity score given a pair of face images. The softmax loss function does not optimize the features to have higher similarity score for positive pairs and lower similarity score for negative pairs, which leads to a performance gap. In this paper, we add an L2-constraint to the feature descriptors which restricts them to lie on a hypersphere of a fixed radius. This module can be easily implemented using existing deep learning frameworks. We show that integrating this simple step in the training pipeline significantly boosts the performance of face verification. Specifically, we achieve state-of-the-art results on the challenging IJB-A dataset, achieving True Accept Rate of 0.909 at False Accept Rate 0.0001 on the face verification protocol. Additionally, we achieve state-of-the-art performance on LFW dataset with an accuracy of 99.78%, and competing performance on YTF dataset with accuracy of 96.08%.
