Table of Contents
Fetching ...

Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

Ilyass Moummad, Kawtar Zaher, Lukas Rauch, Alexis Joly

TL;DR

Hashing-Baseline tackles scalable retrieval by using frozen pretrained embeddings and a training-free pipeline. It combines $k$-dimensional PCA-based reduction, a random orthogonal projection, and threshold binarization with asymmetric Hamming retrieval to generate $d$-bit codes from rich embeddings. The approach is theoretically motivated by the Johnson–Lindenstrauss lemma and hyperplane hashing, suggesting that the combination preserves geometric and angular structure without learning. The work also introduces a new audio hashing benchmark and demonstrates competitive image and audio retrieval without any additional training, highlighting the practicality and scalability of training-free hashing for multimodal retrieval.

Abstract

Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings. We revisit classical, training-free hashing techniques: principal component analysis, random orthogonal projection, and threshold binarization, to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.

Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

TL;DR

Hashing-Baseline tackles scalable retrieval by using frozen pretrained embeddings and a training-free pipeline. It combines -dimensional PCA-based reduction, a random orthogonal projection, and threshold binarization with asymmetric Hamming retrieval to generate -bit codes from rich embeddings. The approach is theoretically motivated by the Johnson–Lindenstrauss lemma and hyperplane hashing, suggesting that the combination preserves geometric and angular structure without learning. The work also introduces a new audio hashing benchmark and demonstrates competitive image and audio retrieval without any additional training, highlighting the practicality and scalability of training-free hashing for multimodal retrieval.

Abstract

Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings. We revisit classical, training-free hashing techniques: principal component analysis, random orthogonal projection, and threshold binarization, to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.

Paper Structure

This paper contains 12 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of Hashing-Baseline: Features are extracted using a frozen pre-trained model, then reduced to the target bit lengths via PCA. The reduced features are orthogonally projected and binarized using a sigmoid function followed by a threshold to generate compact binary codes.
  • Figure 2: Retrieval examples on Flickr25K, showing the nearest neighbors using SimDINO features and their 16-bit hashed codes.