PD-Loss: Proxy-Decidability for Efficient Metric Learning
Pedro Silva, Guilherme A. L. Silva, Pablo Coelho, Vander Freitas, Gladston Moreira, David Menotii, Eduardo Luz
TL;DR
The paper tackles distribution-aware deep metric learning by addressing the batch-size sensitivity of Decidability Loss (D-Loss) through Proxy-Decidability Loss (PD-Loss). PD-Loss uses learnable class proxies to estimate genuine and impostor similarity distributions and optimizes a log-transformed, temperature-scaled version of the Decidability Index $d'$ without relying on exhaustive pairwise mining. Empirical results on CUB-200-2011, CARS196, and LFW show PD-Loss achieving competitive or state-of-the-art performance while maintaining better batch-size robustness and training efficiency. The approach blends the scalability of proxy-based losses with the principled distribution-separability objective, suggesting broad applicability to embedding optimization tasks beyond standard metric learning benchmarks.
Abstract
Deep Metric Learning (DML) aims to learn embedding functions that map semantically similar inputs to proximate points in a metric space while separating dissimilar ones. Existing methods, such as pairwise losses, are hindered by complex sampling requirements and slow convergence. In contrast, proxy-based losses, despite their improved scalability, often fail to optimize global distribution properties. The Decidability-based Loss (D-Loss) addresses this by targeting the decidability index (d') to enhance distribution separability, but its reliance on large mini-batches imposes significant computational constraints. We introduce Proxy-Decidability Loss (PD-Loss), a novel objective that integrates learnable proxies with the statistical framework of d' to optimize embedding spaces efficiently. By estimating genuine and impostor distributions through proxies, PD-Loss combines the computational efficiency of proxy-based methods with the principled separability of D-Loss, offering a scalable approach to distribution-aware DML. Experiments across various tasks, including fine-grained classification and face verification, demonstrate that PD-Loss achieves performance comparable to that of state-of-the-art methods while introducing a new perspective on embedding optimization, with potential for broader applications.
