Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits
Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, Torsten Suel
TL;DR
The paper investigates the sparsity limits of navigable graphs for high-dimensional nearest-neighbor search under greedy routing. It proves a universal upper bound construction achieving an average out-degree of $O(\sqrt{n \log n})$ with a two-step small-world routing guarantee, valid for any distance function, and simultaneously establishes dimension-dependent lower bounds showing that, in Euclidean space with $d=\Omega(\log n)$, any navigable graph must have average degree at least $\Omega(n^{1/2-\delta})$ for any fixed $\delta>0$. The results rely on a constructive union of a near-neighbor graph and a sparse random (or deterministically chosen) edge set, and on anti-concentration bounds for binomial-type neighborhoods to bound how much near neighborhoods can overlap. Together, they delineate the trade-off between sparsity and guaranteed navigability in high dimensions, with additional insights on maximum-degree limitations and potential directions for end-to-end approximation guarantees in graph-based NN search.
Abstract
There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^α)$ for any $α< 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.
