Table of Contents
Fetching ...

LeanVec: Searching vectors faster by making them fit

Mariano Tepper, Ishwar Singh Bhati, Cecilia Aguerrebere, Mark Hildebrand, Ted Willke

TL;DR

LeanVec tackles the memory bandwidth bottlenecks of graph-based similarity search for high-dimensional embeddings by marrying linear dimensionality reduction with Locally-adaptive Vector Quantization, producing compact primary vectors for fast graph traversal and LVQ-quantized secondary vectors for accurate re-ranking. It offers two variants: LeanVec-ID for in-distribution queries and LeanVec-OOD for out-of-distribution queries, with optimization via Frank-Wolfe on a convexified problem and an alternative eigenvector-based projection (LeanVec-Eig) that balances query and data statistics. The approach yields state-of-the-art results across ID and OOD benchmarks, achieving up to 3.7x throughput improvements and up to 4.9x faster index construction, along with substantial scalability to tens of millions of vectors. This work enables faster, cross-modal similarity search in practical systems, with open-source implementations and datasets to support reproducibility and integration into existing SVS pipelines.

Abstract

Modern deep learning models have the ability to generate high-dimensional vectors whose similarity reflects semantic resemblance. Thus, similarity search, i.e., the operation of retrieving those vectors in a large collection that are similar to a given query, has become a critical component of a wide range of applications that demand highly accurate and timely answers. In this setting, the high vector dimensionality puts similarity search systems under compute and memory pressure, leading to subpar performance. Additionally, cross-modal retrieval tasks have become increasingly common, e.g., where a user inputs a text query to find the most relevant images for that query. However, these queries often have different distributions than the database embeddings, making it challenging to achieve high accuracy. In this work, we present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors while maintaining accuracy. We present LeanVec variants for in-distribution (ID) and out-of-distribution (OOD) queries. LeanVec-ID yields accuracies on par with those from recently introduced deep learning alternatives whose computational overhead precludes their usage in practice. LeanVec-OOD uses two novel techniques for dimensionality reduction that consider the query and database distributions to simultaneously boost the accuracy and the performance of the framework even further (even presenting competitive results when the query and database distributions match). All in all, our extensive and varied experimental results show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time over the state of the art.

LeanVec: Searching vectors faster by making them fit

TL;DR

LeanVec tackles the memory bandwidth bottlenecks of graph-based similarity search for high-dimensional embeddings by marrying linear dimensionality reduction with Locally-adaptive Vector Quantization, producing compact primary vectors for fast graph traversal and LVQ-quantized secondary vectors for accurate re-ranking. It offers two variants: LeanVec-ID for in-distribution queries and LeanVec-OOD for out-of-distribution queries, with optimization via Frank-Wolfe on a convexified problem and an alternative eigenvector-based projection (LeanVec-Eig) that balances query and data statistics. The approach yields state-of-the-art results across ID and OOD benchmarks, achieving up to 3.7x throughput improvements and up to 4.9x faster index construction, along with substantial scalability to tens of millions of vectors. This work enables faster, cross-modal similarity search in practical systems, with open-source implementations and datasets to support reproducibility and integration into existing SVS pipelines.

Abstract

Modern deep learning models have the ability to generate high-dimensional vectors whose similarity reflects semantic resemblance. Thus, similarity search, i.e., the operation of retrieving those vectors in a large collection that are similar to a given query, has become a critical component of a wide range of applications that demand highly accurate and timely answers. In this setting, the high vector dimensionality puts similarity search systems under compute and memory pressure, leading to subpar performance. Additionally, cross-modal retrieval tasks have become increasingly common, e.g., where a user inputs a text query to find the most relevant images for that query. However, these queries often have different distributions than the database embeddings, making it challenging to achieve high accuracy. In this work, we present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors while maintaining accuracy. We present LeanVec variants for in-distribution (ID) and out-of-distribution (OOD) queries. LeanVec-ID yields accuracies on par with those from recently introduced deep learning alternatives whose computational overhead precludes their usage in practice. LeanVec-OOD uses two novel techniques for dimensionality reduction that consider the query and database distributions to simultaneously boost the accuracy and the performance of the framework even further (even presenting competitive results when the query and database distributions match). All in all, our extensive and varied experimental results show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time over the state of the art.
Paper Structure (20 sections, 3 theorems, 32 equations, 20 figures, 1 table)

This paper contains 20 sections, 3 theorems, 32 equations, 20 figures, 1 table.

Key Result

Proposition 1

[proposition]theo:leanvec_svd prob:leanvec_orthonormal is upper bounded by the singular value decomposition of ${\bm{\mathbf{X}}}$.

Figures (20)

  • Figure 1: We propose LeanVec, a framework to accelerate similarity search for high-dimensional vectors, including those produced by deep learning models. LeanVec combines a novel linear dimensionality reduction method for in-distribution and out-of-distribution use cases with Locally-adaptive Vector Quantization (LVQ, aguerrebere2023similarity) to achieve state-of-the-art performance and accuracy in graph-based index construction and search. \ref{['fig:rqa_qps_bandwidth']} For high dimensional vectors (e.g., $D=768$), search performance scales with the level of memory compression. Compared to the FP16 encoding, LVQ8 and LVQ4x8 compress the vectors by 2x and 4x for search, respectively, while LeanVec reduces the vector size by 9.6x (4.8x from dimensionality reduction and 2x from LVQ8). At 72 threads (our system has 36 physical cores and 72 threads), LeanVec provides a 8.5x performance gain over FP16 while consuming much less memory bandwidth (95 vs. 149GB/s). \ref{['fig:lean_vec_framework']} The main search in LeanVec returns nearest neighbor candidates and is executed efficiently using primary vectors, i.e., compressed with dimensionality reduction and vector quantization. The candidates are then re-ranked using secondary vectors, i.e., quantized with LVQ.
  • Figure 2: Frank-Wolfe BCD optimization for \ref{['prob:leanvec']} with factor $\alpha \in (0, 1)$.
  • Figure 3: \ref{['algo:leanvec']} converges in 51 iterations for open-images-512-1M with $D=512$ and $d=128$. The total runtime is 4 seconds, respectively. Relaxing the orthogonality constraint incurs a relatively small error of $10^{-3}$.
  • Figure 4: Eigenvector search optimization for \ref{['eq:leanvec_loss_same_proj_equalized']}.
  • Figure 5: The loss in \ref{['prob:leanvec_loss_same_proj']} is a smooth function of $\beta$ when ${\bm{\mathbf{P}}} = \operatorname{eigsearch}(\beta)$ and has a unique minimizer (different for each $d$). \ref{['algo:leanvec_eig_search']} finds the minimum (marked with a circle) of this loss. Additional results in \ref{['fig:leanvec_eigsearch_continued']} of the appendix.
  • ...and 15 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Definition 1
  • Definition 2
  • Lemma 1: jaggi2013revisiting
  • theorem 1