HENN: A Hierarchical Epsilon Net Navigation Graph for Approximate Nearest Neighbor Search
Mohsen Dehghankar, Abolfazl Asudeh
TL;DR
HENN introduces a hierarchical $\psilon$-net navigation graph for ANN search, constructing layers that are $\psilon$-nets of the previous layer to guarantee polylogarithmic worst-case query time while preserving recall and practical efficiency. Each layer supports an intra-layer navigable graph, and inter-layer links connect the same data points across adjacent layers, enabling a top-down greedy search from a root to the base layer. Theoretical results bound the query time as $O\left(d \cdot d^* \cdot \rho_\delta \cdot \log^2 n\right)$ under reasonable assumptions, with $d^*$ the average graph degree and $\rho_\delta$ a Recall Bound; the index size remains $O(n)$, but indexing is heavier due to repeated $\psilon$-net sampling. Empirical evaluations show HENN matches HNSW on standard benchmarks and exceeds it on adversarial distributions, demonstrating robustness and scalability, while preserving a simple implementation and modular design that can integrate different intra-layer graphs. The work also provides a probabilistic polylogarithmic bound for HNSW, offering theoretical insight into its empirical success.
Abstract
Hierarchical graph-based algorithms such as HNSW have achieved state-of-the-art performance for Approximate Nearest Neighbor (ANN) search in practice, yet they often lack theoretical guarantees on query time or recall due to their heavy use of randomized heuristic constructions. Conversely, existing theoretically grounded structures are typically difficult to implement and struggle to scale in real-world scenarios. We propose the Hierarchical $\varepsilon$-Net Navigation Graph (HENN), a novel graph-based indexing structure for ANN search that combines strong theoretical guarantees with practical efficiency. Built upon the theory of $\varepsilon$-nets, HENN guarantees polylogarithmic worst-case query time while preserving high recall and incurring minimal implementation overhead. Moreover, we establish a probabilistic polylogarithmic query time bound for HNSW, providing theoretical insight into its empirical success. In contrast to these prior hierarchical methods that may degrade to linear query time under adversarial data, HENN maintains provable performance independent of the input data distribution. Empirical evaluations demonstrate that HENN achieves faster query time while maintaining competitive recall on diverse data distributions, including adversarial inputs. These results underscore the effectiveness of HENN as a robust and scalable solution for fast and accurate nearest neighbor search.
