Alpha Divergence Losses for Biometric Verification
Dimitrios Koutsianos, Ladislav Mosner, Yannis Panagakis, Themos Stafylakis
TL;DR
This work reframes margin-based biometric verification through the lens of α-divergence losses, enabling sparse posteriors when $α>1$ and introducing two practical strategies to integrate angular margins: Q-Margin, which encodes the margin in the reference measure, and Alpha-Additive-Angular Margin (A3M), which applies the margin to the logits. To address optimization instability caused by sparsity, the authors add A3M-I, a mid-training prototype re-initialization that realigns identity prototypes with embeddings. Across face and speaker verification benchmarks, including IJB-B, IJB-C, and VoxCeleb, the proposed methods yield substantial gains at low false-acceptance rates, while preserving extreme posterior sparsity that suggests potential memory-efficient training. The results demonstrate that combining probabilistic margins with geometric margins can improve verification performance in large-scale, real-world scenarios with millions of identities.
Abstract
Performance in face and speaker verification is largely driven by margin-based softmax losses such as CosFace and ArcFace. Recently introduced $α$-divergence loss functions offer a compelling alternative, particularly due to their ability to induce sparse solutions (when $α>1$). However, integrating an angular margin-crucial for verification tasks-is not straightforward. We find that this integration can be achieved in at least two distinct ways: via the reference measure (prior probabilities) or via the logits (unnormalized log-likelihoods). In this paper, we explore both pathways, deriving two novel margin-based $α$-divergence losses: Q-Margin (margin in the reference measure) and A3M (margin in the logits). We identify and address a training instability in A3M-caused by sparsity-with a simple yet effective prototype re-initialization strategy. Our methods achieve significant performance gains on the challenging IJB-B and IJB-C face verification benchmarks. We demonstrate similarly strong performance in speaker verification on VoxCeleb. Crucially, our models significantly outperform strong baselines at low false acceptance rates (FAR). This capability is critical for practical high-security applications, such as banking authentication, when minimizing false authentications is paramount. Finally, the sparsity of $α$-divergence-based posteriors enables memory-efficient training, which is crucial for datasets with millions of identities.
