Table of Contents
Fetching ...

GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction

Mariano Tepper, Ishwar Singh Bhati, Cecilia Aguerrebere, Ted Willke

TL;DR

New linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries are presented.

Abstract

Embedding models can generate high-dimensional vectors whose similarity reflects semantic affinities. Thus, accurately and timely retrieving those vectors in a large collection that are similar to a given query has become a critical component of a wide range of applications. In particular, cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. Here, it is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. Moreover, the high vector dimensionality puts these search systems under compute and memory pressure, leading to subpar performance. In this work, we present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries. The linear LeanVec-Sphering outperforms other linear methods, trains faster, comes with no hyperparameters, and allows to set the target dimensionality more flexibly. The nonlinear Generalized LeanVec (GleanVec) uses a piecewise linear scheme to further improve the search accuracy while remaining computationally nimble. Initial experimental results show that LeanVec-Sphering and GleanVec push the state of the art for vector search.

GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction

TL;DR

New linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries are presented.

Abstract

Embedding models can generate high-dimensional vectors whose similarity reflects semantic affinities. Thus, accurately and timely retrieving those vectors in a large collection that are similar to a given query has become a critical component of a wide range of applications. In particular, cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. Here, it is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. Moreover, the high vector dimensionality puts these search systems under compute and memory pressure, leading to subpar performance. In this work, we present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries. The linear LeanVec-Sphering outperforms other linear methods, trains faster, comes with no hyperparameters, and allows to set the target dimensionality more flexibly. The nonlinear Generalized LeanVec (GleanVec) uses a piecewise linear scheme to further improve the search accuracy while remaining computationally nimble. Initial experimental results show that LeanVec-Sphering and GleanVec push the state of the art for vector search.

Paper Structure

This paper contains 18 sections, 25 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An intuitive cartoon example of the importance of query-aware dimensionality reduction for maximum inner product search. The optimal solution for a query-agnostic method would be to project the database (${\mathcal{X}}$) and the query (${\mathcal{Q}}$) vectors onto the first principal axis of ${\mathcal{X}}$ (large green arrow). This choice will yield a poor resolution of the inner products as this direction is orthogonal to the principal axis of ${\mathcal{Q}}$ (orange arrow), i.e., $(\forall {\bm{\mathbf{q}}} \in {\mathcal{Q}}, \forall {\bm{\mathbf{x}}} \in {\mathcal{X}})\ \langle {\bm{\mathbf{q}}}, {\bm{\mathbf{x}}} \rangle \approx 0$. Using a query-aware technique, we can select a direction that maximally preserves the inner products, in this case the second principal direction of ${\mathcal{X}}$ and the principal direction of ${\mathcal{Q}}$ coincide and provide the best choice.
  • Figure 2: LeanVec-Sphering is a novel query-aware technique for dimensionality reduction from $D$ dimensions to $d<D$ target dimensions. The linear transformation in LeanVec-Sphering allows to select $d$ flexibly during search, instead of fixing its value during the construction of the search index (see \ref{['sec:flexible_target_dimensionality']} for further details). This enables tuning $d$ when using the index operationally.
  • Figure 3: GleanVec can be seen as a new type of neural network that borrows elements from the Vector Quantized Variational Autoencoder (VQ-VAE) van2017neural. While VQ-VAE uses cluster centers to perform vector quantization, GleanVec uses them for dimensionality reduction.
  • Figure 4: For different ID datasets, all the analyzed methods (including LeanVec-Sphering) perform similarly. This means that using LeanVec-Sphering in the ID setting is safe as it is equivalent to performing a query-agnostic dimensionality reduction with the SVD.
  • Figure 5: The proposed LeanVec-Sphering outperforms the alternatives by yielding a lower value of LeanVec loss function (top row) and more importantly a higher accuracy for a brute-force search (bottom row) for different OOD datasets and target dimensionalities).
  • ...and 3 more figures