Table of Contents
Fetching ...

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa

TL;DR

This work tackles efficient approximate nearest neighbor search on graph-based indexes by introducing probabilistic routing with a formal $(\delta,1-\epsilon)$ guarantee. The authors present two baseline routing tests (SimHash and RCEOs) and a novel Partitioned Extreme Order Statistics (PEOs) method that uses space partitioning and extreme-value statistics to decide which neighbors require exact distance calculations. They prove the probabilistic guarantee of PEOs, analyze the impact of the space-partitioning parameter $L$, and demonstrate substantial practical gains: up to $1.6$–$2.5\times$ higher QPS and $70$–$80\%$ fewer exact distance evaluations on HNSW and NSSG, with consistent improvements over leading routing methods by $1.1$–$1.4\times$. The results indicate that PEOs is scalable with modest space overhead and provides a principled, high-throughput routing approach for graph-based ANNS.

Abstract

Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical backing. This paper aims to enhance routing within graph-based ANNS by introducing a method that offers a probabilistic guarantee when exploring a node's neighbors in the graph. We formulate the problem as probabilistic routing and develop two baseline strategies by incorporating locality-sensitive techniques. Subsequently, we introduce PEOs, a novel approach that efficiently identifies which neighbors in the graph should be considered for exact distance calculation, thus significantly improving efficiency in practice. Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times.

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

TL;DR

This work tackles efficient approximate nearest neighbor search on graph-based indexes by introducing probabilistic routing with a formal guarantee. The authors present two baseline routing tests (SimHash and RCEOs) and a novel Partitioned Extreme Order Statistics (PEOs) method that uses space partitioning and extreme-value statistics to decide which neighbors require exact distance calculations. They prove the probabilistic guarantee of PEOs, analyze the impact of the space-partitioning parameter , and demonstrate substantial practical gains: up to higher QPS and fewer exact distance evaluations on HNSW and NSSG, with consistent improvements over leading routing methods by . The results indicate that PEOs is scalable with modest space overhead and provides a principled, high-throughput routing approach for graph-based ANNS.

Abstract

Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical backing. This paper aims to enhance routing within graph-based ANNS by introducing a method that offers a probabilistic guarantee when exploring a node's neighbors in the graph. We formulate the problem as probabilistic routing and develop two baseline strategies by incorporating locality-sensitive techniques. Subsequently, we introduce PEOs, a novel approach that efficiently identifies which neighbors in the graph should be considered for exact distance calculation, thus significantly improving efficiency in practice. Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times.
Paper Structure (37 sections, 6 theorems, 57 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 37 sections, 6 theorems, 57 equations, 10 figures, 6 tables, 2 algorithms.

Key Result

Lemma 4.1

(SimHash) Given $\bm{u}$, $\bm{q}$, and $m$ random vectors $\{\, \bm{a}_i \,\}^m_{i=1} \sim \mathcal{N}(0, I^d)$, the angle $\theta$ between $\bm{u}$ and $\bm{q}$ can be estimated as

Figures (10)

  • Figure 1: Illustration of the PEOs test. There are $n$ neighbors of $v$. $\theta^1, \ldots, \theta^n$ denote the angles between $\bm{e}^1, \ldots, \bm{e}^n$ and $\bm{q}$, respectively. $u^2$ and $u^{n-1}$ pass the test (indicated by "+"). We access their raw vectors from the dataset and calculate their distances to $\bm{q}$.
  • Figure 2: Recall-QPS evaluation. PEOs (H) denotes HNSW+PEOs and PEOs (N) denotes NSSG+PEOs. The recalls of Glass and FINGER are lower than 30% on GloVe200 and thus not shown.
  • Figure 3: Effect of $L$. We plot the approximate values of $J_{opt}$, $J_{rel}$, and $\Delta$ under the isotropic distribution and the empirical performance.
  • Figure 4: Effect of $\epsilon$ on DEEP10M, GloVe200 and GloVe300.
  • Figure 5: Effect of compact implementation on search speed.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Definition 3.1: Nearest Neighbor Search (NNS)
  • Definition 3.2: Probabilistic Routing
  • Lemma 4.1
  • Lemma 4.2
  • Definition 6.1
  • Lemma 6.2
  • Theorem 6.3
  • Lemma 6.4
  • proof
  • proof
  • ...and 4 more