Table of Contents
Fetching ...

Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces

Antonio Pariente, Ignacio Hounie, Santiago Segarra, Alejandro Ribeiro

TL;DR

The paper tackles nearest neighbor search for arbitrary dissimilarities by embedding the data into $q$-metric or ultrametric spaces via a canonical projection $P_q^*$, while preserving nearest neighbors. It leverages VP-trees with $q$-powered pruning to achieve logarithmic-like search efficiency in the ultrametric limit and introduces a learned embedding $_q$ to approximate $q$-metric distances for queries. The approach enables approximate yet efficient NN search across diverse dissimilarities, demonstrated by theoretical pruning guarantees and empirical speedups, including two-stage retrieval to mitigate spurious optima in $q=ty$. Overall, Infinity Search presents a competitive framework that blends canonical projections with learned distance embeddings to deliver scalable, versatile ANN search, even for non-metric and sparse data. The work suggests practical impact for building versatile vector databases capable of handling a wide range of similarity measures with robust performance.

Abstract

An ultrametric space or infinity-metric space is defined by a dissimilarity function that satisfies a strong triangle inequality in which every side of a triangle is not larger than the larger of the other two. We show that search in ultrametric spaces with a vantage point tree has worst-case complexity equal to the depth of the tree. Since datasets of interest are not ultrametric in general, we employ a projection operator that transforms an arbitrary dissimilarity function into an ultrametric space while preserving nearest neighbors. We further learn an approximation of this projection operator to efficiently compute ultrametric distances between query points and points in the dataset. We proceed to solve a more general problem in which we consider projections in $q$-metric spaces -- in which triangle sides raised to the power of $q$ are smaller than the sum of the $q$-powers of the other two. Notice that the use of learned approximations of projected $q$-metric distances renders the search pipeline approximate. We show in experiments that increasing values of $q$ result in faster search but lower recall. Overall, search in q-metric and infinity metric spaces is competitive with existing search methods.

Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces

TL;DR

The paper tackles nearest neighbor search for arbitrary dissimilarities by embedding the data into -metric or ultrametric spaces via a canonical projection , while preserving nearest neighbors. It leverages VP-trees with -powered pruning to achieve logarithmic-like search efficiency in the ultrametric limit and introduces a learned embedding to approximate -metric distances for queries. The approach enables approximate yet efficient NN search across diverse dissimilarities, demonstrated by theoretical pruning guarantees and empirical speedups, including two-stage retrieval to mitigate spurious optima in . Overall, Infinity Search presents a competitive framework that blends canonical projections with learned distance embeddings to deliver scalable, versatile ANN search, even for non-metric and sparse data. The work suggests practical impact for building versatile vector databases capable of handling a wide range of similarity measures with robust performance.

Abstract

An ultrametric space or infinity-metric space is defined by a dissimilarity function that satisfies a strong triangle inequality in which every side of a triangle is not larger than the larger of the other two. We show that search in ultrametric spaces with a vantage point tree has worst-case complexity equal to the depth of the tree. Since datasets of interest are not ultrametric in general, we employ a projection operator that transforms an arbitrary dissimilarity function into an ultrametric space while preserving nearest neighbors. We further learn an approximation of this projection operator to efficiently compute ultrametric distances between query points and points in the dataset. We proceed to solve a more general problem in which we consider projections in -metric spaces -- in which triangle sides raised to the power of are smaller than the sum of the -powers of the other two. Notice that the use of learned approximations of projected -metric distances renders the search pipeline approximate. We show in experiments that increasing values of result in faster search but lower recall. Overall, search in q-metric and infinity metric spaces is competitive with existing search methods.

Paper Structure

This paper contains 30 sections, 10 theorems, 76 equations, 33 figures, 2 tables, 7 algorithms.

Key Result

Theorem 1

Consider a dataset $X$, a query $x_o$, a dissimilarity function $d$ satisfying the strong triangle inequality eqn_strong_triangle_inequality and a vantage point tree $T(X)$ constructed by the recursive partition of $X$ into vantage points and their corresponding inside and outside sets [cf. eqn_vp_c

Figures (33)

  • Figure 1: Search in metric spaces requires fewer comparisons than search in arbitrary spaces because for some queries -- such as $x_o'$ and $x_o"$ -- the triangle inequality allows us to restrict comparisons to subsets of the dataset $X$. Queries, such as $x_o$, for which triangle inequality bounds are inconclusive, also exist \ref{['eqn_vp_fail_condition']}. This latter eventuality is impossible when the strong triangle inequality \ref{['eqn_strong_triangle_inequality']} holds, leading to a more efficient search in ultrametric spaces (Theorem \ref{['theo_log_complexity']}).
  • Figure 2: VP Tree search complexity on a $\infty$-metric space with $n\in\{100,\dots,150\text{K}\}$ points. The worst-case bound in comparisons corresponds to the depth of the tree.
  • Figure 3: Nearest neighbor search over MNIST-Fashion-784 with Canonical Projection $E_q$ for $n=1{,}000$ points.
  • Figure 4: Nearest neighbor search over MNIST-Fashion-784 with learned embedding $\Phi(\cdot,\theta^\star)$ for $n=10{,}000$.
  • Figure 5: Infinity Search results on searching $n\in\{10\text{K},50\text{K},\,100\text{K},500\text{K},\,1\text{M}, 5\text{M}\}$ points of Deep1B-96 with Euclidean distance.
  • ...and 28 more figures

Theorems & Definitions (22)

  • Theorem 1
  • proof
  • Theorem 2: Existence and uniqueness
  • Proposition 1
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • ...and 12 more