Table of Contents
Fetching ...

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Yutaro Oguri, Yusuke Matsui

TL;DR

This work analyzes adaptive entry point selection for graph-based ANNS and introduces two theoretical constructs, $b$-monotonic path and $B$-MSNET, to model practical graphs. It proves that adaptively chosen entry points can reduce hop counts upper bounds compared to fixed central points under general conditions, and extends prior theory to broader graph classes and region assumptions. Empirically, the method delivers speedups of 1.2–2.3× across diverse datasets with minimal memory overhead and demonstrates improved resilience on hard worst-case instances. The findings advance understanding of entry-point optimization in real-world, high-dimensional ANNS, with implications for scalable, accurate similarity search in large databases.

Abstract

We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

TL;DR

This work analyzes adaptive entry point selection for graph-based ANNS and introduces two theoretical constructs, -monotonic path and -MSNET, to model practical graphs. It proves that adaptively chosen entry points can reduce hop counts upper bounds compared to fixed central points under general conditions, and extends prior theory to broader graph classes and region assumptions. Empirically, the method delivers speedups of 1.2–2.3× across diverse datasets with minimal memory overhead and demonstrates improved resilience on hard worst-case instances. The findings advance understanding of entry-point optimization in real-world, high-dimensional ANNS, with implications for scalable, accurate similarity search in large databases.

Abstract

We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: and , which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.
Paper Structure (20 sections, 2 theorems, 17 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 17 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Lemma 4.2

Let $\mathcal{P}(v_s, v_t)$ be a $b$-monotonic path. Let $\bm{x}_s = \bm{\phi}^{-1}(v_s)$ and $\bm{x}_t = \bm{\phi}^{-1}(v_t)$. The following formula holds:

Figures (8)

  • Figure 1: An illustrated example of $b$-monotonic path $(b=2)$$\mathcal{P}(v_1, v_7) = \{v_1, \dots, v_7\}$ on a graph $G(\mathcal{V}, \mathcal{E})$. Each node $v_i$ corresponds to a vector $\bm{x}_i\in\mathcal{X}$. Note that $r_i = \Vert \bm{x}_{i} - \bm{x}_7 \Vert_2 - \Vert \bm{x}_{i+1} - \bm{x}_7 \Vert_2$ for $i\in\{1, \dots, 6\}$. $r_2, r_4$ are negative and other all $r_i$ are positive. Thus, this path is a $2-$monotonic path. An arrow between two nodes colored with blue represents a backward hop with a negative $r_i$.
  • Figure 2: A $2-$monotonic path starting from $\bm{d}_j$ to $\textbf{GT}(\bm{q})$. (a) shows the case (i) where $\bm{q}, \textbf{GT}(\bm{q})\in\mathcal{U}_j$, and (b) shows the case (ii) where $\textbf{GT}(\bm{q})\in\mathcal{U}_k$$(j\neq k)$.
  • Figure 3: The evaluation of NSG Fu2017FastAN_NSG with adaptive entry point selection on various datasets. A curve on the upper right side is better than others in terms of accuracy-speed tradeoff. We sweep a curve by changing the length of the search queue $L\in\{16, 24, 32, 48, 64, 96, 128, 256, 512\}$. We took the average of five measurements for each cases.
  • Figure 4: Visualization of reproduced hard instances presented in indyk2023worstcase for (a) NSG and (b) DiskANN. GTs represent the ground truth points.
  • Figure 5: Heatmaps that represent the results of (a) NSG and (b) DiskANN on the hard instances when varying number of entry point candidates $K$ and the length of search queue $L$. Each cell in the heatmap represents the $\text{Recall@}10$. It shows only a part of all $L$ for better visualization.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 4.1: $b$-monotonic path
  • Lemma 4.2
  • Definition 4.3: $B$-MSNET
  • Theorem 4.4
  • proof : Proof of \ref{['thm:upperbound_proof']}