Table of Contents
Fetching ...

Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

Fabian Deuser, Philipp Hausenblas, Hannah Schieber, Daniel Roth, Martin Werner, Norbert Oswald

TL;DR

The paper tackles the challenge of efficiently mining hard negatives in contrastive learning for large-scale, high-dimensional data. It introduces a GPU-friendly Locality-Sensitive Hashing scheme that binarizes embeddings and retrieves negative samples via Hamming distance, enabling scalable global negative mining without storing full embeddings. The authors provide a theoretical bound linking angular proximity to Hamming proximity and validate the approach across six diverse datasets, showing competitive or superior performance with substantial speed and memory savings. Practically, the method generalizes across vision and text domains and offers a viable alternative to expensive pre-epoch or within-batch mining, with potential implications for faster and scalable representation learning.

Abstract

Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring significantly less computation than existing hard negative mining strategies.

Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

TL;DR

The paper tackles the challenge of efficiently mining hard negatives in contrastive learning for large-scale, high-dimensional data. It introduces a GPU-friendly Locality-Sensitive Hashing scheme that binarizes embeddings and retrieves negative samples via Hamming distance, enabling scalable global negative mining without storing full embeddings. The authors provide a theoretical bound linking angular proximity to Hamming proximity and validate the approach across six diverse datasets, showing competitive or superior performance with substantial speed and memory savings. Practically, the method generalizes across vision and text domains and offers a viable alternative to expensive pre-epoch or within-batch mining, with potential implications for faster and scalable representation learning.

Abstract

Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring significantly less computation than existing hard negative mining strategies.

Paper Structure

This paper contains 30 sections, 16 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Illustration of the anchor (A, blue), positive (P, red), and several negatives (N, black). Left: The raw cosine similarities between the anchor and negatives are shown, commonly used to identify hn. Middle and Right: Two randomly sampled hyperplanes, $\frac{1}{\sqrt{2}}(1, 1)$ and $\frac{1}{\sqrt{5}}(2,-1)$, illustrate that hn are likely to be mapped to the same side of the hyperplane as the anchor. The Hamming distance, defined by the number of hyperplanes separating embeddings, decreases with higher cosine similarity, enabling effective hn identification.
  • Figure 2: Percentage of neighbors retrieved by lsh that fall within the theoretical similarity bounds $\varepsilon$ and $\varepsilon a$, compared to cosine similarity.
  • Figure 3: lsh and pre-epoch hn sampling in comparison considering the search time vs. the dataset size and the model output size.
  • Figure 5: A comparison of lsh and random sampling on SOP and MS Marco. We compare the overlap with Pre-Epoch Increment (hn) and mean positional distance.
  • Figure 6: Comparison of the similarity between the retrieved approximated hn and the actual hn retrieved by the cosine similarity for SOP (top), VIGOR(middle) and MS MARCO (bottom).
  • ...and 6 more figures