Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Yutaro Oguri; Yusuke Matsui

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Yutaro Oguri, Yusuke Matsui

TL;DR

This work analyzes adaptive entry point selection for graph-based ANNS and introduces two theoretical constructs, $b$-monotonic path and $B$-MSNET, to model practical graphs. It proves that adaptively chosen entry points can reduce hop counts upper bounds compared to fixed central points under general conditions, and extends prior theory to broader graph classes and region assumptions. Empirically, the method delivers speedups of 1.2–2.3× across diverse datasets with minimal memory overhead and demonstrates improved resilience on hard worst-case instances. The findings advance understanding of entry-point optimization in real-world, high-dimensional ANNS, with implications for scalable, accurate similarity search in large databases.

Abstract

We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

TL;DR

This work analyzes adaptive entry point selection for graph-based ANNS and introduces two theoretical constructs,

-monotonic path and

-MSNET, to model practical graphs. It proves that adaptively chosen entry points can reduce hop counts upper bounds compared to fixed central points under general conditions, and extends prior theory to broader graph classes and region assumptions. Empirically, the method delivers speedups of 1.2–2.3× across diverse datasets with minimal memory overhead and demonstrates improved resilience on hard worst-case instances. The findings advance understanding of entry-point optimization in real-world, high-dimensional ANNS, with implications for scalable, accurate similarity search in large databases.

Abstract

We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts:

and

, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.

Paper Structure (20 sections, 2 theorems, 17 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 17 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Approximate Nearest Neighbor Search (ANNS)
Graph-based Index
Characteristics of dataset in ANNS
Preliminary
Voronoi Partition
The fixed central entry point
Recap of Adaptive Entry Point Selection
Theoretical Analysis
$b$-monotonic path & $B$-MSNET
Effectiveness of Adaptive Entry Point Selection
Comparison to Previous Works
Empirical Findings
Experiment Settings
...and 5 more sections

Key Result

Lemma 4.2

Let $\mathcal{P}(v_s, v_t)$ be a $b$-monotonic path. Let $\bm{x}_s = \bm{\phi}^{-1}(v_s)$ and $\bm{x}_t = \bm{\phi}^{-1}(v_t)$. The following formula holds:

Figures (8)

Figure 1: An illustrated example of $b$-monotonic path $(b=2)$$\mathcal{P}(v_1, v_7) = \{v_1, \dots, v_7\}$ on a graph $G(\mathcal{V}, \mathcal{E})$. Each node $v_i$ corresponds to a vector $\bm{x}_i\in\mathcal{X}$. Note that $r_i = \Vert \bm{x}_{i} - \bm{x}_7 \Vert_2 - \Vert \bm{x}_{i+1} - \bm{x}_7 \Vert_2$ for $i\in\{1, \dots, 6\}$. $r_2, r_4$ are negative and other all $r_i$ are positive. Thus, this path is a $2-$monotonic path. An arrow between two nodes colored with blue represents a backward hop with a negative $r_i$.
Figure 2: A $2-$monotonic path starting from $\bm{d}_j$ to $\textbf{GT}(\bm{q})$. (a) shows the case (i) where $\bm{q}, \textbf{GT}(\bm{q})\in\mathcal{U}_j$, and (b) shows the case (ii) where $\textbf{GT}(\bm{q})\in\mathcal{U}_k$$(j\neq k)$.
Figure 3: The evaluation of NSG Fu2017FastAN_NSG with adaptive entry point selection on various datasets. A curve on the upper right side is better than others in terms of accuracy-speed tradeoff. We sweep a curve by changing the length of the search queue $L\in\{16, 24, 32, 48, 64, 96, 128, 256, 512\}$. We took the average of five measurements for each cases.
Figure 4: Visualization of reproduced hard instances presented in indyk2023worstcase for (a) NSG and (b) DiskANN. GTs represent the ground truth points.
Figure 5: Heatmaps that represent the results of (a) NSG and (b) DiskANN on the hard instances when varying number of entry point candidates $K$ and the length of search queue $L$. Each cell in the heatmap represents the $\text{Recall@}10$. It shows only a part of all $L$ for better visualization.
...and 3 more figures

Theorems & Definitions (5)

Definition 4.1: $b$-monotonic path
Lemma 4.2
Definition 4.3: $B$-MSNET
Theorem 4.4
proof : Proof of \ref{['thm:upperbound_proof']}

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

TL;DR

Abstract

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)