Table of Contents
Fetching ...

LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search

Elias Jääsaari, Ville Hyvönen, Teemu Roos

TL;DR

This work proposes a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression.

Abstract

Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method. LoRANN is competitive with the leading graph-based algorithms and outperforms the state-of-the-art GPU ANN methods on high-dimensional data sets.

LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search

TL;DR

This work proposes a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression.

Abstract

Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method. LoRANN is competitive with the leading graph-based algorithms and outperforms the state-of-the-art GPU ANN methods on high-dimensional data sets.

Paper Structure

This paper contains 42 sections, 1 theorem, 7 equations, 17 figures, 2 tables.

Key Result

Theorem 1

If $\mathbf{X} = \mathbf{C}$, then $\mathbf{\tilde{y}} = \mathbf{\hat{y}}_{\text{RRR}}$.

Figures (17)

  • Figure 1: Recall vs. QPS on the Yandex T2I OOD data set (400K sampled corpus points) without (left) and with (right) the final re-ranking step. LoRANN-query is trained using a sample of 400K points from the query distribution as a training set, while LoRANN-query-big uses a sample of 1.2M points. LoRANN-corpus is trained using the corpus as a training set. LoRANN-corpus-local is trained using the corpus as a training set with only the cluster points as the local training sets of the reduced-rank regression models. It is beneficial to (1) use a sample from the actual query distribution as a training set and to (2) select the local training set by using also the points outside of the cluster as described in Section \ref{['sec:RRR']}. The performance difference decreases when the final re-ranking step is introduced (requiring the original data set to be kept in memory).
  • Figure 2: Performance comparison between RRR and PQ. The $k$-means clustering (IVF index) is kept constant to directly compare the effect of the score computation method (here $c$ denotes the number of clusters). The proposed score computation method outperforms the baseline method (PQ).
  • Figure 3: Performance comparison of RRR and PQ at different levels of memory usage. We vary the rank parameter $r$ for RRR and the code size for PQ such that $b$, bytes per vector, is similar for both. RRR@($b\approx 16$) outperforms even PQ@($b\approx 128)$ which uses eight times as much memory.
  • Figure 4: LoRANN ablation study. On the high-dimensional ($d=768$) data set (left), all the components improve the performance of LoRANN. On the lower-dimensional ($d=200$) data set (right), all the components except dimensionality reduction (DR) improve performance.
  • Figure 5: CPU comparison. The QPS-recall curves of LoRANN and the leading graph library GLASS cross at the 95% (left) and at the 90% recall level (right), indicating that LoRANN is the fastest method at the lower recall levels, and GLASS at the higher recall levels.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof