Table of Contents
Fetching ...

Coordination-Free Lane Partitioning for Convergent ANN Search

Carl Kugblenu, Petri Vuorimaa

TL;DR

The paper tackles the inefficiency of parallel lane execution in production ANN search, where independent lanes tend to converge on the same candidates and waste compute. It introduces alpha-partitioning, a coordination-free pool-then-partition strategy that builds a deterministic candidate pool, applies a per-query PRF shuffle, and assigns each lane a disjoint slice of positions to guarantee complementary coverage at the same budget and deadline. The approach yields substantial gains across datasets and index families: on SIFT1M with HNSW recall@10 rises from 0.249 to 0.999 with four lanes, and on MS MARCO HNSW hit@10 improves from 0.200 to 0.601 with parity to the single-index ceiling; IVF shows smaller but consistent gains by reducing list-level duplication. The method is simple to deploy, scalable, and broadly applicable, providing a practical blueprint for improving recall without increasing latency or resource use by converting redundant fan-out into diverse, useful coverage.

Abstract

Production vector search systems often fan out each query across parallel lanes (threads, replicas, or shards) to meet latency service-level objectives (SLOs). In practice, these lanes rediscover the same candidates, so extra compute does not increase coverage. We present a coordination-free lane partitioner that turns duplication into complementary work at the same cost and deadline. For each query we (1) build a deterministic candidate pool sized to the total top-k budget, (2) apply a per-query pseudorandom permutation, and (3) assign each lane a disjoint slice of positions. Lanes then return different results by construction, with no runtime coordination. At equal cost with four lanes (total candidate budget 64), on SIFT1M (1M SIFT feature vectors) with Hierarchical Navigable Small World graphs (HNSW) recall@10 rises from 0.249 to 0.999 while lane overlap falls from nearly 100% to 0%. On MS MARCO (8.8M passages) with HNSW, hit@10 improves from 0.200 to 0.601 and Mean Reciprocal Rank at 10 (MRR@10) from 0.133 to 0.330. For inverted file (IVF) indexes we see smaller but consistent gains (for example, +11% on MS MARCO) by de-duplicating list routing. A microbenchmark shows planner overhead of ~37 microseconds per query (mean at the main setting) with linear growth in the number of merged candidates. These results yield a simple operational guideline: size the per-query pool to the total budget, deterministically partition positions across lanes, and turn redundant fan-out into complementary coverage without changing budget or deadline.

Coordination-Free Lane Partitioning for Convergent ANN Search

TL;DR

The paper tackles the inefficiency of parallel lane execution in production ANN search, where independent lanes tend to converge on the same candidates and waste compute. It introduces alpha-partitioning, a coordination-free pool-then-partition strategy that builds a deterministic candidate pool, applies a per-query PRF shuffle, and assigns each lane a disjoint slice of positions to guarantee complementary coverage at the same budget and deadline. The approach yields substantial gains across datasets and index families: on SIFT1M with HNSW recall@10 rises from 0.249 to 0.999 with four lanes, and on MS MARCO HNSW hit@10 improves from 0.200 to 0.601 with parity to the single-index ceiling; IVF shows smaller but consistent gains by reducing list-level duplication. The method is simple to deploy, scalable, and broadly applicable, providing a practical blueprint for improving recall without increasing latency or resource use by converting redundant fan-out into diverse, useful coverage.

Abstract

Production vector search systems often fan out each query across parallel lanes (threads, replicas, or shards) to meet latency service-level objectives (SLOs). In practice, these lanes rediscover the same candidates, so extra compute does not increase coverage. We present a coordination-free lane partitioner that turns duplication into complementary work at the same cost and deadline. For each query we (1) build a deterministic candidate pool sized to the total top-k budget, (2) apply a per-query pseudorandom permutation, and (3) assign each lane a disjoint slice of positions. Lanes then return different results by construction, with no runtime coordination. At equal cost with four lanes (total candidate budget 64), on SIFT1M (1M SIFT feature vectors) with Hierarchical Navigable Small World graphs (HNSW) recall@10 rises from 0.249 to 0.999 while lane overlap falls from nearly 100% to 0%. On MS MARCO (8.8M passages) with HNSW, hit@10 improves from 0.200 to 0.601 and Mean Reciprocal Rank at 10 (MRR@10) from 0.133 to 0.330. For inverted file (IVF) indexes we see smaller but consistent gains (for example, +11% on MS MARCO) by de-duplicating list routing. A microbenchmark shows planner overhead of ~37 microseconds per query (mean at the main setting) with linear growth in the number of merged candidates. These results yield a simple operational guideline: size the per-query pool to the total budget, deterministically partition positions across lanes, and turn redundant fan-out into complementary coverage without changing budget or deadline.

Paper Structure

This paper contains 47 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: $\alpha$-partitioning Workflow. (A) Naive parallelism: high overlap, wasted budget. (B) Build deterministic candidate pool. (C) PRF shuffle and position-based partition. (D) Disjoint union merge yields zero overlap.
  • Figure 2: SIFT1M, HNSW, $\alpha$-sweep with $M{=}4$, $k_{\text{lane}}{=}16$ ($k_{\text{total}}{=}64$). The gray dashed line marks the single-index ceiling at equal cost (in our run this equals the $\alpha{=}1$ mean, 0.999); $\alpha$ increases coverage monotonically from 0.249 to 0.999 while overlap drops from 1.00 to 0.00. Equal cost and deadline.
  • Figure 3: MS MARCO (8.8M), IVF-Flat, $\alpha$-sweep. Setup: $M{=}4$, $k_{\text{lane}}{=}16$, $k_{\text{total}}{=}64$. Left axis: recall@10; right axis: overlap. Despite lower document-level $\rho_{0}$ (intra-list diversity), $\alpha$-partitioning recovers list-level duplication for a +11% gain at equal cost (same total candidate budget and deadline).
  • Figure 4: MS MARCO (8.8M), HNSW at $\alpha\in\{0,1\}$ with $M{=}4$, $k_{\text{lane}}{=}16$. Metrics: hit@10 and MRR@10 (mean $\pm$ std). $\alpha{=}1$ reaches the single-index ceiling (+200% hit@10, +148% MRR@10) at the same budget and deadline.
  • Figure 5: Coverage validation on SIFT1M at $\alpha{=}1$. X: $K_{\text{pool}}/k_{\text{total}}$; Y: recall@10. Measured points track the predicted $\min(k_{\text{total}}/K_{\text{pool}},1)$ curve. $K_{\text{pool}}{=}k_{\text{total}}$ maximizes quality with zero overlap; under-pooling degrades recall. Equal-cost evaluation.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Remark 1: Disjointness at full dedication