Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search

Lars Gottesbüren; Laxman Dhulipala; Rajesh Jayaram; Jakub Lacki

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search

Lars Gottesbüren, Laxman Dhulipala, Rajesh Jayaram, Jakub Lacki

TL;DR

This work tackles distributed approximate nearest neighbor search by combining neighborhood-preserving graph partitioning with modular routing methods, enabling scalable search across billion-point datasets. It introduces two routing schemes, kRt and hRt, that work with any partitioning and provide theoretical guarantees for locating shards containing approximate neighbors. The approach enables balanced graph partitioning to achieve substantially higher throughput (up to $2.14\times$) at fixed recall compared to strong baselines, while training times remain practical (kRt in about half an hour on billion-scale data, hRt in under 20 seconds). The results demonstrate that modular routing paired with GP yields fast, scalable ANNS with high recall, offering a practical path toward efficient distributed ANN systems.

Abstract

We consider the fundamental problem of decomposing a large-scale approximate nearest neighbor search (ANNS) problem into smaller sub-problems. The goal is to partition the input points into neighborhood-preserving shards, so that the nearest neighbors of any point are contained in only a few shards. When a query arrives, a routing algorithm is used to identify the shards which should be searched for its nearest neighbors. This approach forms the backbone of distributed ANNS, where the dataset is so large that it must be split across multiple machines. In this paper, we design simple and highly efficient routing methods, and prove strong theoretical guarantees on their performance. A crucial characteristic of our routing algorithms is that they are inherently modular, and can be used with any partitioning method. This addresses a key drawback of prior approaches, where the routing algorithms are inextricably linked to their associated partitioning method. In particular, our new routing methods enable the use of balanced graph partitioning, which is a high-quality partitioning method without a naturally associated routing algorithm. Thus, we provide the first methods for routing using balanced graph partitioning that are extremely fast to train, admit low latency, and achieve high recall. We provide a comprehensive evaluation of our full partitioning and routing pipeline on billion-scale datasets, where it outperforms existing scalable partitioning methods by significant margins, achieving up to 2.14x higher QPS at 90% recall$@10$ than the best competitor.

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search

TL;DR

) at fixed recall compared to strong baselines, while training times remain practical (kRt in about half an hour on billion-scale data, hRt in under 20 seconds). The results demonstrate that modular routing paired with GP yields fast, scalable ANNS with high recall, offering a practical path toward efficient distributed ANN systems.

Abstract

than the best competitor.

Paper Structure (27 sections, 2 theorems, 3 equations, 10 figures, 4 tables)

This paper contains 27 sections, 2 theorems, 3 equations, 10 figures, 4 tables.

Introduction
Partitioning
Approximate $k$-NN Graph-Building
Partitioning into Overlapping Shards
Routing
K-Means Tree Routing Index: kRt
Sorting-LSH Routing Index: hRt
Empirical Evaluation
Large-Scale Throughput Evaluation
Training Time
Analyzing Partitioning and Routing Quality
Small-Scale Evaluation for Learned Partitions
Conclusion
hRt pseudocodes
Routing via Voting
...and 12 more sections

Key Result

Theorem 1

Fix any approximation factor $c>1$, and let $P \subset (\mathbb{R}^d,\|\cdot\|_\rho)$ be a subset of the $d$-dimensional space equipped with the $\ell_\rho$ norm, for any $\rho \in [1,2]$. Set stretch factor $\alpha = O(c)$, repetitions $r = O(n^{1/c})$ and window size $W = O(1)$. Then the following

Figures (10)

Figure 1: The $x$-axis shows the recall of the approximate $k$-NN graph used for graph partitioning. The $y$-axis shows the average recall of queries for $10$ nearest neighbors when only a single shard is inspected. The plot is computed using the SIFT1M dataset.
Figure 2: Illustration of an example where routing using a single center per shard fails. The nearest neighbors of $q$ are in the cluster of $c_2$, but $d(q, c_1) < d(q, c_2)$. If the hierarchical sub-clusters are represented with their own centers, the routing works correctly.
Figure 3: Throughput vs recall evaluation on big-ann-benchmarks.
Figure 4: Evaluating disjoint partitioning methods with kRt and routing oracle (dashed), assuming exhaustive search in the shards.
Figure 5: hRt vs kRt with GP as the partition.
...and 5 more figures

Theorems & Definitions (3)

Theorem 1
Theorem 1
proof

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search

TL;DR

Abstract

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (3)