Table of Contents
Fetching ...

Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search

Sebastian Bruch, Aditya Krishnan, Franco Maria Nardini

TL;DR

This paper addresses the routing bottleneck in clustering-based maximum inner product search (MIPS) for storage-backed ANN systems. It introduces Optimist, a principled, unsupervised router based on the optimism in the face of uncertainty, which uses per-shard distribution moments to bound the maximum inner product and a tunable optimism parameter δ. By combining moment-based estimates with a space-efficient covariance sketch, Optimist matches state-of-the-art recall while significantly reducing the number of points probed and the associated I/O, particularly when shards are stored on slow storage. The approach offers meaningful practical gains in throughput and bandwidth efficiency for large-scale MIPS, and lays groundwork for further refinements in covariance sketching and distributional modeling. Overall, the work demonstrates that principled optimistic routing can substantially improve end-to-end ANN performance in real-world, storage-constrained environments.

Abstract

Clustering-based nearest neighbor search is an effective method in which points are partitioned into geometric shards to form an index, with only a few shards searched during query processing to find a set of top-$k$ vectors. Even though the search efficacy is heavily influenced by the algorithm that identifies the shards to probe, it has received little attention in the literature. This work bridges that gap by studying routing in clustering-based maximum inner product search. We unpack existing routers and notice the surprising contribution of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a framework that incorporates the moments of the distribution of inner products within each shard to estimate the maximum inner product. We then present an instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers such as ScaNN by probing up to $50\%$ fewer points on benchmark datasets. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and requires $\mathcal{O}(1)$ vectors per shard.

Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search

TL;DR

This paper addresses the routing bottleneck in clustering-based maximum inner product search (MIPS) for storage-backed ANN systems. It introduces Optimist, a principled, unsupervised router based on the optimism in the face of uncertainty, which uses per-shard distribution moments to bound the maximum inner product and a tunable optimism parameter δ. By combining moment-based estimates with a space-efficient covariance sketch, Optimist matches state-of-the-art recall while significantly reducing the number of points probed and the associated I/O, particularly when shards are stored on slow storage. The approach offers meaningful practical gains in throughput and bandwidth efficiency for large-scale MIPS, and lays groundwork for further refinements in covariance sketching and distributional modeling. Overall, the work demonstrates that principled optimistic routing can substantially improve end-to-end ANN performance in real-world, storage-constrained environments.

Abstract

Clustering-based nearest neighbor search is an effective method in which points are partitioned into geometric shards to form an index, with only a few shards searched during query processing to find a set of top- vectors. Even though the search efficacy is heavily influenced by the algorithm that identifies the shards to probe, it has received little attention in the literature. This work bridges that gap by studying routing in clustering-based maximum inner product search. We unpack existing routers and notice the surprising contribution of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a framework that incorporates the moments of the distribution of inner products within each shard to estimate the maximum inner product. We then present an instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers such as ScaNN by probing up to fewer points on benchmark datasets. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and requires vectors per shard.
Paper Structure (27 sections, 2 theorems, 22 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 2 theorems, 22 equations, 15 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Denote by $\mu_i$ and $\Sigma_i$ the mean and covariance of the distribution of $\mathcal{P}_i$. An upper-bound on the solution to Problem problem:optimization-problem for $\delta \in (0, 1)$ is:

Figures (15)

  • Figure 1: (a) Top-$1$ recall vs. percentage of points probed on Text2Image where points have varying norms; (b) and (c) Distribution of inner products between a shard and a query on GloVe. Overlaid are scores computed by Mean and NormalizedMean.
  • Figure 2: Top-$100$ recall vs. volume of probed points. Partitioning is by spherical KMeans. Scann has parameter $T$, SubPartition$t$, and Optimist rank $t$ and degree of optimism $\delta$.
  • Figure 3: Mean latency (ms) to reach $95\%$ recall when PQ-compressed shards are on SSD and blob storage. For each dataset, we plot the latency breakdown for NormalizedMean (top bar) and Optimist (bottom), and report relative gains (negative value indicates gain by Optimist).
  • Figure 4: Mean prediction error $\mathcal{E}_\ell(\tau, \cdot)$ of Equation (\ref{['equation:general-prediction-error']}) versus $\ell$ (percent shards).
  • Figure 5: Histogram of the $(t + 1)$-th eigenvalue. For each dataset, we pick the partitioning and $t$ from Table \ref{['table:router-configuration']}. Plots show that almost all shards for all datasets have $(t+1)$-th eigenvalue bounded away from $1$, except for a few shards for Nq-Ada2.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Lemma 1
  • proof
  • Definition 1: Masked Sketch of Rank $t$
  • Lemma 2
  • proof
  • proof : Proof of Lemma \ref{['lemma:sketch-guarantee']}