Understanding Hyperbolic Metric Learning through Hard Negative Sampling

Yun Yue; Fangzhou Lin; Guanyi Mou; Ziming Zhang

Understanding Hyperbolic Metric Learning through Hard Negative Sampling

Yun Yue, Fangzhou Lin, Guanyi Mou, Ziming Zhang

TL;DR

The paper investigates why hyperbolic metric learning improves contrastive image representations and under what conditions it outperforms Euclidean embeddings. It analyzes how geometry shapes hard negative sampling by deriving geometry-dependent triplet weights $p(x^-)$ and demonstrates that hyperbolic distance modulates relative negatives, leading to complementary behavior across geometries. The authors propose a simple mixed-geometry fusion that combines Euclidean and hyperbolic information, showing improved Recall@K on CUB, Cars, and SOP with Vision Transformers. This work provides practical guidance for leveraging hyperbolic geometry in metric learning and highlights a viable ensemble approach to exploit diverse hard negatives, with code released for reproducibility.

Abstract

In recent years, there has been a growing trend of incorporating hyperbolic geometry methods into computer vision. While these methods have achieved state-of-the-art performance on various metric learning tasks using hyperbolic distance measurements, the underlying theoretical analysis supporting this superior performance remains under-exploited. In this study, we investigate the effects of integrating hyperbolic space into metric learning, particularly when training with contrastive loss. We identify a need for a comprehensive comparison between Euclidean and hyperbolic spaces regarding the temperature effect in the contrastive loss within the existing literature. To address this gap, we conduct an extensive investigation to benchmark the results of Vision Transformers (ViTs) using a hybrid objective function that combines loss from Euclidean and hyperbolic spaces. Additionally, we provide a theoretical analysis of the observed performance improvement. We also reveal that hyperbolic metric learning is highly related to hard negative sampling, providing insights for future work. This work will provide valuable data points and experience in understanding hyperbolic image embeddings. To shed more light on problem-solving and encourage further investigation into our approach, our code is available online (https://github.com/YunYunY/HypMix).

Understanding Hyperbolic Metric Learning through Hard Negative Sampling

TL;DR

and demonstrates that hyperbolic distance modulates relative negatives, leading to complementary behavior across geometries. The authors propose a simple mixed-geometry fusion that combines Euclidean and hyperbolic information, showing improved Recall@K on CUB, Cars, and SOP with Vision Transformers. This work provides practical guidance for leveraging hyperbolic geometry in metric learning and highlights a viable ensemble approach to exploit diverse hard negatives, with code released for reproducibility.

Abstract

Paper Structure (11 sections, 11 equations, 4 figures, 2 tables)

This paper contains 11 sections, 11 equations, 4 figures, 2 tables.

Introduction
Related Work
Understanding Geometry Effect
Preliminaries
Geometries vs. Hard Negatives
Ensemble Learner with Mix Geometries
Experiments and Results
Datasets
Implementation Details
Results
Conclusion

Figures (4)

Figure 1: For a random chosen anchor, we arrange hard negatives selected by a well-trained model based on their distance to the anchor. The top row displays the anchor and its positive pair. We present sorted negatives from Euclidean embedding model ("Sph-ViT") and the hyperbolic embedding model ("Hyp-ViT"), respectively. Negatives are ordered by increasing distance from left to right. "Sph-ViT" is trained with $\tau=0.05$, while "Hyp-ViT" is trained with $\tau=0.05$ and $c=0.1$. Red boxes highlight negatives that are absent in the top 6 hard negatives of the other model.
Figure 2: Experimentally, we find that Euclidean ("Sph") and hyperbolic ("Hyp") embeddings, with varying $\tau$ and $c$ values, exhibit complementary characteristics which is attributed to differences in negative selection across distinct geometry embeddings. To leverage this complementary information from different geometries, we introduce embedding fusion.
Figure 3: $p(\mathbf x^-)$ for different ViT. x-axis is the index of data points when the distance between $(\mathbf x^-)$ to anchor point $\mathbf x$ is sorted in ascending order (better view in color print).
Figure 4: Recall of 1K metric comparison of models trained with different temperatures $\tau$ using CUB-200-2011 dataset. The x-axis indicates different $\tau$. "Sph-" are versions with hypersphere embeddings optimized using $D_{cos}$, "Hyp-" are versions with hyperbolic embeddings optimized using $D_{hyp}$. For "Hyp-" we fix the curvature parameter $c = 0.1$

Understanding Hyperbolic Metric Learning through Hard Negative Sampling

TL;DR

Abstract

Understanding Hyperbolic Metric Learning through Hard Negative Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)