Coordination-Free Lane Partitioning for Convergent ANN Search

Carl Kugblenu; Petri Vuorimaa

Coordination-Free Lane Partitioning for Convergent ANN Search

Carl Kugblenu, Petri Vuorimaa

TL;DR

The paper tackles the inefficiency of parallel lane execution in production ANN search, where independent lanes tend to converge on the same candidates and waste compute. It introduces alpha-partitioning, a coordination-free pool-then-partition strategy that builds a deterministic candidate pool, applies a per-query PRF shuffle, and assigns each lane a disjoint slice of positions to guarantee complementary coverage at the same budget and deadline. The approach yields substantial gains across datasets and index families: on SIFT1M with HNSW recall@10 rises from 0.249 to 0.999 with four lanes, and on MS MARCO HNSW hit@10 improves from 0.200 to 0.601 with parity to the single-index ceiling; IVF shows smaller but consistent gains by reducing list-level duplication. The method is simple to deploy, scalable, and broadly applicable, providing a practical blueprint for improving recall without increasing latency or resource use by converting redundant fan-out into diverse, useful coverage.

Abstract

Production vector search systems often fan out each query across parallel lanes (threads, replicas, or shards) to meet latency service-level objectives (SLOs). In practice, these lanes rediscover the same candidates, so extra compute does not increase coverage. We present a coordination-free lane partitioner that turns duplication into complementary work at the same cost and deadline. For each query we (1) build a deterministic candidate pool sized to the total top-k budget, (2) apply a per-query pseudorandom permutation, and (3) assign each lane a disjoint slice of positions. Lanes then return different results by construction, with no runtime coordination. At equal cost with four lanes (total candidate budget 64), on SIFT1M (1M SIFT feature vectors) with Hierarchical Navigable Small World graphs (HNSW) recall@10 rises from 0.249 to 0.999 while lane overlap falls from nearly 100% to 0%. On MS MARCO (8.8M passages) with HNSW, hit@10 improves from 0.200 to 0.601 and Mean Reciprocal Rank at 10 (MRR@10) from 0.133 to 0.330. For inverted file (IVF) indexes we see smaller but consistent gains (for example, +11% on MS MARCO) by de-duplicating list routing. A microbenchmark shows planner overhead of ~37 microseconds per query (mean at the main setting) with linear growth in the number of merged candidates. These results yield a simple operational guideline: size the per-query pool to the total budget, deterministically partition positions across lanes, and turn redundant fan-out into complementary coverage without changing budget or deadline.

Coordination-Free Lane Partitioning for Convergent ANN Search

TL;DR

Abstract

Coordination-Free Lane Partitioning for Convergent ANN Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)

Theorems & Definitions (1)