Graph-Based Nearest-Neighbor Search without the Spread

Jeff Giliberti; Sariel Har-Peled; Jonas Sauer; Ali Vakilian

Graph-Based Nearest-Neighbor Search without the Spread

Jeff Giliberti, Sariel Har-Peled, Jonas Sauer, Ali Vakilian

TL;DR

The work develops a spread-free framework for graph-based approximate nearest neighbor search in spaces with bounded doubling dimension. It combines a reduction to spread-bounded instances via a coarse ANN, a low-quality HST, and a $(1+\\varepsilon)$-ANN structure to achieve near-linear space and $O(\\log n)$ query time; and it introduces a linear-space universal NN graph built from a greedy permutation, complemented by a reverse-tree mechanism and active-resolution slices to precisely target search regions. A key novelty is overlaying scale-specific graphs into a single structure while using a reverse-tree to bypass unhelpful regions, enabling scalable ANN with provable guarantees. The paper also offers a bootstrap-based improvement that reduces query time further, and provides thorough correctness and runtime analyses for both the multiresolution and linear-space constructions. Together, these results advance graph-based ANN by removing dependence on the spread and delivering practical, scalable performance in high-dimensional settings.

Abstract

$\renewcommand{\Re}{\mathbb{R}}$Recent work showed how to construct nearest-neighbor graphs of linear size, on a given set $P$ of $n$ points in $\Re^d$, such that one can answer approximate nearest-neighbor queries in logarithmic time in the spread. Unfortunately, the spread might be unbounded in $n$, and an interesting theoretical question is how to remove the dependency on the spread. Here, we show how to construct an external linear-size data structure that, combined with the linear-size graph, allows us to answer ANN queries in logarithmic time in $n$.

Graph-Based Nearest-Neighbor Search without the Spread

TL;DR

-ANN structure to achieve near-linear space and

query time; and it introduces a linear-space universal NN graph built from a greedy permutation, complemented by a reverse-tree mechanism and active-resolution slices to precisely target search regions. A key novelty is overlaying scale-specific graphs into a single structure while using a reverse-tree to bypass unhelpful regions, enabling scalable ANN with provable guarantees. The paper also offers a bootstrap-based improvement that reduces query time further, and provides thorough correctness and runtime analyses for both the multiresolution and linear-space constructions. Together, these results advance graph-based ANN by removing dependence on the spread and delivering practical, scalable performance in high-dimensional settings.

Abstract

Recent work showed how to construct nearest-neighbor graphs of linear size, on a given set

points in

, such that one can answer approximate nearest-neighbor queries in logarithmic time in the spread. Unfortunately, the spread might be unbounded in

, and an interesting theoretical question is how to remove the dependency on the spread. Here, we show how to construct an external linear-size data structure that, combined with the linear-size graph, allows us to answer ANN queries in logarithmic time in

Paper Structure (35 sections, 21 theorems, 44 equations, 2 figures, 1 table)

This paper contains 35 sections, 21 theorems, 44 equations, 2 figures, 1 table.

Introduction
Our results
Background
Metric spaces
Packing/covering
Nearest neighbor
Hierarchically well-separated trees (HSTs)
Rough ANN
Preliminaries
Greedy permutation
Graph-based search for ANN
A NN graph via greedy permutation
Approximate nearest neighbor via HST
Fast 1+eps-ANN queries via HST if one is lucky
Building ANN graphs via active resolutions
...and 20 more sections

Key Result

Theorem 2.12

h-gaa-11. Given a set $\mathsf{P}$ of $n$ points in $\mathbb{R}^d$, for $d \leq n$, one can compute a $2 \space \sqrt{d} n^5$-approximate HST of $\mathsf{P}$ in $O( d n \log n)$ expected time.

Figures (2)

Figure 4.1: Left: A point set and its distance in a certain resolution. Right: A slice with its representatives, and the connected components of these representatives within a certain radius.
Figure 5.1: Stage I of the ANN search algorithm.

Theorems & Definitions (45)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 2.6
Definition 2.7
Definition 2.8
Definition 2.9
Definition 2.10
...and 35 more

Graph-Based Nearest-Neighbor Search without the Spread

TL;DR

Abstract

Graph-Based Nearest-Neighbor Search without the Spread

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (45)