Realigned Softmax Warping for Deep Metric Learning

Michael G. DeMoor; John J. Prevost

Realigned Softmax Warping for Deep Metric Learning

Michael G. DeMoor, John J. Prevost

TL;DR

This work tackles the coupling of push/pull forces in deep metric learning induced by softmax-based losses. It introduces Realigned Softmax Warping in an unbounded Euclidean embedding space, using a two-function warp to place a unique minimum at a controllable attraction point along the line between class proxies, thereby simultaneously enhancing separability and preserving compactness. The approach yields competitive, often superior results on standard metric-learning benchmarks and demonstrates robustness via ablations, hyperparameter analyses, and a face-recognition evaluation. The findings suggest that carefully designed warp functions in Euclidean space can offer a powerful alternative to traditional cosine-based losses, with broad applicability to retrieval and verification tasks.

Abstract

Deep Metric Learning (DML) loss functions traditionally aim to control the forces of separability and compactness within an embedding space so that the same class data points are pulled together and different class ones are pushed apart. Within the context of DML, a softmax operation will typically normalize distances into a probability for optimization, thus coupling all the push/pull forces together. This paper proposes a potential new class of loss functions that operate within a euclidean domain and aim to take full advantage of the coupled forces governing embedding space formation under a softmax. These forces of compactness and separability can be boosted or mitigated within controlled locations at will by using a warping function. In this work, we provide a simple example of a warping function and use it to achieve competitive, state-of-the-art results on various metric learning benchmarks.

Realigned Softmax Warping for Deep Metric Learning

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 16 equations, 12 figures, 8 tables)

This paper contains 39 sections, 2 theorems, 16 equations, 12 figures, 8 tables.

Introduction
Preliminaries
Loss Types
Softmax Embeddings
Cosine vs. Euclidean
Definitions
Realigned Softmax Warping
Embedding Formation
Function Warping
Multi-Class Setting
Interpretation of Proposed Loss
Experiments
Datasets
Implementation
Results Comparison and Discussion
...and 24 more sections

Key Result

Lemma 3.1

Given both $p_c$ and $p_{c^\prime}$, $f$ is monotone iff for any $r \in \mathbb{R}$, $e_*$ and $e^*$ are respectively the only minimum and maximum of Eq. eq:Function-Softmax within $D_r(p_{c^{\prime}})$.

Figures (12)

Figure 1: Binary-Class diagram comparing the forces of traditional proxy-losses vs. our approach. Best viewed in color. Each node is an embedding vector. Different shapes are different classes. Red/blue/magenta lines indicate pull/push/coupled forces respectively. Black nodes are proxies. a) Traditional proxy losses are governed by interacting push/pull forces that could negatively interfere with each other. b) Instead of pulling embeddings toward their proxies and pushing them away from other proxies, our loss deals with the coupled minimums of both forces directly and realigns them outward. This encourages embeddings to move away from both proxies (including their own) towards a single outward point which boosts separability.
Figure 2: Loss comparison. Best viewed in color. Each node is an embedding vector. Different shapes are different classes. Red/blue/magenta lines indicate pull/push/coupled forces respectively. Black nodes are proxies. Small magenta nodes are coupled points of attraction. a) Triplet loss weinberger2005distance pulls an anchor closer to a positive sample and pushes away a negative. b)-c) N-Pair sohn2016improved and Lifted-Structure oh2016deep generalize this further using multiple negatives. d) Proxy-NCA movshovitz2017no compares a sample against a proxy for each class. e) Proxy-Anchor kim2020proxy associates all data in a batch with each proxy. f) Instead of pulling embeddings toward their proxies, our loss realigns coupled push/pull forces toward outward points of attraction during training thus boosting separability.
Figure 3: Softmax optimization landscapes for various functions. (a) $f = t$ (Laplacian Kernel): The default unwarped landscape in Eq. \ref{['eq:Default-Softmax-Inverted']}. (b) $f = t^2$ (Gaussian Kernel): This convex function prioritizes separability with no focus on compactness. (c) $f = \sqrt{t}$: This concave function warps space inwards towards the ground truth proxy, hurting separability. (d) An approximation of a slight warp using $f_1$ = Eq. \ref{['eq:Warp-Function']}, $\alpha = 3.0, k_1 = 0.65, k_2 = 1.5$, $f_2 = t$: Here the global minimum is slightly shifted outward away from $p_c$ (boosting separability), and space that is further away is warped inward to preserve compactness. Best viewed in color and zoom.
Figure 4: Intuitively speaking, as training progresses and embeddings group together, the coupled forces acting directly on them will begin to align with the forces acting on the proxies.
Figure 5: In the multi-class setting the binary class analysis can be repeated for each $(p_{y_i},p_j)$ proxy pair ($y_i$ is the ground-truth). The optimization landscapes can then be "combined" to get the multi-class landscape.
...and 7 more figures

Theorems & Definitions (2)

Lemma 3.1
Proposition 3.2

Realigned Softmax Warping for Deep Metric Learning

TL;DR

Abstract

Realigned Softmax Warping for Deep Metric Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (2)