Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

Qin Zhang; Linghan Xu; Qingming Tang; Jun Fang; Ying Nian Wu; Joe Tighe; Yifan Xing

Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing

TL;DR

The paper tackles threshold inconsistency in open-world image retrieval by introducing OPIS, a variance-based metric that quantifies how operating characteristics vary across classes within a calibration range. It reveals a Pareto frontier where improving accuracy can degrade threshold consistency and presents the Threshold-Consistent Margin (TCM) loss, a two-cosine-margin regularizer that targets hard sample pairs to align representations with a universal threshold. Empirically, TCM improves threshold consistency substantially across four benchmarks while preserving or boosting recall@1, demonstrating practical gains for threshold-based DML in open-world deployments. Overall, OPIS plus TCM provide a calibration-friendly framework that simplifies threshold selection and enhances robustness in real-world image retrieval systems.

Abstract

Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures across test classes and data distributions. When combined with the common practice of using a fixed threshold to declare a match, this gives rise to significant performance variations in terms of false accept rate (FAR) and false reject rate (FRR) across test classes and data distributions. We define this issue in DML as threshold inconsistency. In real-world applications, such inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems. To measure this inconsistency, we propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes. Using the OPIS metric, we find that achieving high accuracy levels in a DML model does not automatically guarantee threshold consistency. In fact, our investigation reveals a Pareto frontier in the high-accuracy regime, where existing methods to improve accuracy often lead to degradation in threshold consistency. To address this trade-off, we introduce the Threshold-Consistent Margin (TCM) loss, a simple yet effective regularization technique that promotes uniformity in representation structures across classes by selectively penalizing hard sample pairs. Extensive experiments demonstrate TCM's effectiveness in enhancing threshold consistency while preserving accuracy, simplifying the threshold selection process in practical DML settings.

Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

TL;DR

Abstract

Paper Structure (25 sections, 20 equations, 10 figures, 11 tables, 2 algorithms)

This paper contains 25 sections, 20 equations, 10 figures, 11 tables, 2 algorithms.

Introduction
Related Works
Threshold Inconsistency in Deep Metric Learning
Towards Threshold-Consistent Deep Metric Learning
Experiments
Datasets and Implementation Details
Ablation and Complexity Analysis
Image Retrieval Experiment
Conclusion
Appendix
Operating-Point-Inconsistency Score
Utility Score Analysis: A Gaussian Model
Lower and Upper Bounds of OPIS
Sensitivity of OPIS to Calibration Ranges
Additional Ablation Studies for TCM
...and 10 more sections

Figures (10)

Figure 1: Here we show that (a) without threshold-consistent representation, selecting the right threshold for a commercial image retrieval system that serves a diverse range of test classes and distributions is challenging. It requires careful manual tuning of retrieval thresholds to strike a balance across multiple datasets. However, (b) with threshold-consistent representation, different test distributions yield similar distance thresholds at the performance target, effectively simplifying the otherwise complicated manual threshold tuning process. In the plots, $d^*$ denotes the distance threshold selected to align the False Positive (FP) rate with a pre-defined target.
Figure 2: Utility - Distance Threshold Curves are presented for test classes in the iNaturalist-2018 and Cars-196 datasets. Each curve represents a class in its respective dataset. The calibration range is underscored by the red lines. As defined in Eq.\ref{['eq:opis']}, the calibration range is based on a pre-defined FAR or FRR target. Thus, changing the loss can result in minor shifts in the calibration range. After integration of the TCM regularization, a significant enhancement in the alignment of utility curves across various classes is observed, accompanied by a notable enhancement in threshold consistency, as indicated by the reduction in OPIS by up to 55%.
Figure 3: The plot depicts the relations between recognition error (measured by $100-\text{recall@1}$, the lower the better) and threshold inconsistency (measured by OPIS, the lower the better) across low- and high-accuracy regimes. Each represents a trained DML model, with its size indicating the batch size used during training. In the low accuracy regime, located in the right side of the plot, there is a simultaneous improvement in threshold consistency and accuracy, as highlighted by $\swarrow$. However, beyond a certain point, a Pareto frontier emerges (indicated by $\nwarrow$), where enhancing accuracy comes at the expense of threshold consistency. Notably, the inclusion of our proposed TCM regularization (marked in red) leads to a substantial OPIS reduction, well below the marked Pareto frontier. Best viewed in color.
Figure 4: (a) An overview of the threshold-consistent DML training framework. Here, the base loss and TCM regularization are combined in an additive fashion to reduce the trade-off between accuracy and threshold consistency. (b) Distinguishing TCM from Margin-Based Softmax Loss such as deng2019arcface. See TCM vs Margin-based Softmax loss section for detailed explanation of the differences. In this illustration, $\theta_1$ and $\theta_2$ represent the intra-class arc lengths for the blue and red classes, respectively. $x_1$ and $x_3$ are both instances of the blue class with class centroid $W_1$, whereas $x_2$ belongs to the red class with centroid $W_2$. In this case, $x_1$, $x_2$ are hard negative sample pairs, and $x_1$, $x_3$ are hard positive sample pairs. Best viewed in color.
Figure 5: We present visualizations of 2D embedding distributions for the MNIST dataset, both with and without TCM regularization. In the figure, each arrow's direction corresponds to a class centroid and is labeled with its respective digit in white. The width of the line perpendicular to each arrow reflects the intra-class representation compactness, with narrower lines indicating more compact embeddings. The color intensity conveys the probability density distribution of embeddings within each class, with higher density depicted in red.
...and 5 more figures

Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

TL;DR

Abstract

Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)