Table of Contents
Fetching ...

Potential Field Based Deep Metric Learning

Shubhang Bhatnagar, Narendra Ahuja

TL;DR

PFML introduces a novel potential-field framework for deep metric learning, where each sample acts as a charge generating an attraction and a repulsion field that decays with distance. By superposing fields from embeddings and learnable proxies, PFML models global, all-pair interactions while mitigating noise via distance decay, and optimizes by minimizing a total potential energy. Theoretical results (Proposition 1 and Corollary 1) and extensive experiments show improved robustness to label noise and closer proxy-data alignment compared to non-decaying proxy-based or tuple-based methods, yielding state-of-the-art image retrieval performance on Cars-196, CUB-200-2011, and SOP. These findings highlight a scalable, robust alternative to tuple mining and proxy-only losses with strong practical impact for fine-grained recognition and retrieval tasks.

Abstract

Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model that instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks- Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.

Potential Field Based Deep Metric Learning

TL;DR

PFML introduces a novel potential-field framework for deep metric learning, where each sample acts as a charge generating an attraction and a repulsion field that decays with distance. By superposing fields from embeddings and learnable proxies, PFML models global, all-pair interactions while mitigating noise via distance decay, and optimizes by minimizing a total potential energy. Theoretical results (Proposition 1 and Corollary 1) and extensive experiments show improved robustness to label noise and closer proxy-data alignment compared to non-decaying proxy-based or tuple-based methods, yielding state-of-the-art image retrieval performance on Cars-196, CUB-200-2011, and SOP. These findings highlight a scalable, robust alternative to tuple mining and proxy-only losses with strong practical impact for fine-grained recognition and retrieval tasks.

Abstract

Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model that instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks- Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.
Paper Structure (32 sections, 31 equations, 5 figures, 9 tables)

This paper contains 32 sections, 31 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of our Potential-field based DML pipeline. The process includes (1) Computing attraction and repulsion fields generated by each embedding and proxy, (2) Computing the class potential fields by superposition of individual fields (3) Evaluating total potential energy by summing up the potentials of embeddings and proxies under the class potential field and (4) Updating locations of sample embeddings (through network parameters) and proxies to minimize total potential energy through backprop.
  • Figure 2: An example of the class potential fields $\Psi_{1}$ and $\Psi_{2}$ (Sec. \ref{['sec:potential_combined']}), created by superposing the fields of individual embeddings (Sec \ref{['sec:attraction']}) belonging to Classes 1 and 2. Arrows denote the gradient of the respective potentials representing the net force on them. (a) $\Psi_{1}$ draws samples/proxies of class 1 towards nearby samples/ proxies of class 1 while keeping them at least $\delta$ distance away from embeddings of class 2. Proxies at starred locations are drawn towards the nearest Class 1 embeddings which are potential minima, helping them better model the data distribution (Sec 3.4). (b) $\Psi_{2}$ draws embeddings of class 2 towards other nearby embeddings of class 2 instead of distant class 2 embeddings, which might be a significantly different variant of the class (or potentially a mislabeled data point), helping better feature learning (or improving noise robustness if mislabeled). $\Psi_{2}$ also keeps class 2 embeddings $\delta$ distance them away from embeddings of class 1.
  • Figure 3: Variation in Recall@1 with $M$, $\delta$, $\alpha$ and $\delta_{rep} - \delta_{att}$ on the Cars-196 and CUB-200-2011 datasets. Error bars represent std. deviations over 5 runs.
  • Figure 4: Example image retrieved by our method for query images from (a) Cars-196 (b) CUB-200-2011 and (c) SOP test datasets, in increasing order of distance from the query. Correct retrievals have a green border, while incorrect ones have a red one.
  • Figure 5: A t-sne visualization of a semantic representation space learnt by our method on the CUB-200 dataset