Table of Contents
Fetching ...

Distribution-Aware Exploration for Adaptive HNSW Search

Chao Zhang, Renée J. Miller

TL;DR

This work tackles the lack of recall guarantees and inefficiency in HNSW searches caused by a fixed, uniform ef across queries. It introduces Adaptive-ef (Ada-ef), a distribution-aware, per-query runtime mechanism that estimates an appropriate ef from offline dataset statistics and a query's observed distance distribution, aiming to meet a target recall with minimal overhead. The approach rests on a theoretical foundation that FDLs under common similarity metrics approximate Gaussian distributions in high dimensions, enabling efficient online estimation via a precomputed mean vector and covariance. Empirically, Ada-ef delivers up to 4x online latency reductions and dramatic offline time/memory savings compared to state-of-the-art adaptive methods, while robustly handling real-world, skewed embedding spaces and dynamic workloads.

Abstract

Hierarchical Navigable Small World (HNSW) is widely adopted for approximate nearest neighbor search (ANNS) for its ability to deliver high recall with low latency on large-scale, high-dimensional embeddings. The exploration factor, commonly referred to as ef, is a key parameter in HNSW-based vector search that balances accuracy and efficiency. However, existing systems typically rely on manually and statically configured ef values that are uniformly applied across all queries. This results in a distribution-agnostic configuration that fails to account for the non-uniform and skewed nature of real-world embedding data and query workloads. As a consequence, HNSW-based systems suffer from two key practical issues: (i) the absence of recall guarantees, and (ii) inefficient ANNS performance due to over- or under-searching. In this paper, we propose Adaptive-ef (Ada-ef), a data-driven, update-friendly, query-adaptive approach that dynamically configures ef for each query at runtime to approximately meet a declarative target recall with minimal computation. The core of our approach is a theoretically grounded statistical model that captures the similarity distribution between each query and the database vectors. Based on this foundation, we design a query scoring mechanism that distinguishes between queries requiring only small ef and those that need larger ef to meet a target recall, and accordingly assigns an appropriate ef to each query. Experimental results on real-world embeddings produced by state-of-the-art Transformer models from OpenAI and Cohere show that, compared with state-of-the-art learning-based adaptive approaches, our method achieves the target recall while avoiding both over- and under-searching, reducing online query latency by up to 4x, offline computation time by 50x, and offline memory usage by 100x.

Distribution-Aware Exploration for Adaptive HNSW Search

TL;DR

This work tackles the lack of recall guarantees and inefficiency in HNSW searches caused by a fixed, uniform ef across queries. It introduces Adaptive-ef (Ada-ef), a distribution-aware, per-query runtime mechanism that estimates an appropriate ef from offline dataset statistics and a query's observed distance distribution, aiming to meet a target recall with minimal overhead. The approach rests on a theoretical foundation that FDLs under common similarity metrics approximate Gaussian distributions in high dimensions, enabling efficient online estimation via a precomputed mean vector and covariance. Empirically, Ada-ef delivers up to 4x online latency reductions and dramatic offline time/memory savings compared to state-of-the-art adaptive methods, while robustly handling real-world, skewed embedding spaces and dynamic workloads.

Abstract

Hierarchical Navigable Small World (HNSW) is widely adopted for approximate nearest neighbor search (ANNS) for its ability to deliver high recall with low latency on large-scale, high-dimensional embeddings. The exploration factor, commonly referred to as ef, is a key parameter in HNSW-based vector search that balances accuracy and efficiency. However, existing systems typically rely on manually and statically configured ef values that are uniformly applied across all queries. This results in a distribution-agnostic configuration that fails to account for the non-uniform and skewed nature of real-world embedding data and query workloads. As a consequence, HNSW-based systems suffer from two key practical issues: (i) the absence of recall guarantees, and (ii) inefficient ANNS performance due to over- or under-searching. In this paper, we propose Adaptive-ef (Ada-ef), a data-driven, update-friendly, query-adaptive approach that dynamically configures ef for each query at runtime to approximately meet a declarative target recall with minimal computation. The core of our approach is a theoretically grounded statistical model that captures the similarity distribution between each query and the database vectors. Based on this foundation, we design a query scoring mechanism that distinguishes between queries requiring only small ef and those that need larger ef to meet a target recall, and accordingly assigns an appropriate ef to each query. Experimental results on real-world embeddings produced by state-of-the-art Transformer models from OpenAI and Cohere show that, compared with state-of-the-art learning-based adaptive approaches, our method achieves the target recall while avoiding both over- and under-searching, reducing online query latency by up to 4x, offline computation time by 50x, and offline memory usage by 100x.

Paper Structure

This paper contains 24 sections, 1 theorem, 9 equations, 7 figures, 10 tables, 2 algorithms.

Key Result

theorem 1

Let $\mathbf{q}=(q_1,\ldots,q_d)$ be a query vector, and let $\mathbf{V}$ be a dataset of data vectors, each of dimensionality $d$, and $\mathbf{V}$ is i.i.d. across its dimensions. Then, the full distance list $FDL_{IP}(\mathbf{q}, \mathbf{V})$ converges in distribution to a normal distribution as

Figures (7)

  • Figure 1: Recall distribution of HNSW search. GloVe pennington-etal-2014-glove: 1.8M vectors (100D), 10K queries, Top-100 ANNS with ef = 100 and 200. MS MARCO (OpenAI Ada-002 embeddings) msmarco_openai: 8.8M vectors (1536D), 6.9K queries, Top-1000 ANNS with ef = 1000 and 2000.
  • Figure 2: Overview of Ada-ef. It consists of two stages: offline and online computations. In the offline stage, dataset-level statistics are computed (§ \ref{['sec:sim_distribution']}), followed by the construction of an ef-estimation table (§ \ref{['sec:data_driven_ef']}). In the online stage, the search follows the standard HNSW process until the base layer, where adaptive-ef search begins (§ \ref{['sec:adaptive_ef']}): (1) Distance collection: exploring a limited number of nodes to compute a query score using offline statistics; (2) Search with estimated ef: using the score to select an ef from the estimation table to approximately reach a declarative target recall.
  • Figure 3: Distribution of individual embedding dimensions that are randomly sampled in GloVe and MS MARCO.
  • Figure 4: Search performance on real and synthetic datasets.
  • Figure 5: Distribution of ef values dynamically assigned by Ada-ef across queries in each dataset (log scale).
  • ...and 2 more figures

Theorems & Definitions (4)

  • definition 1: Full Distance List
  • definition 2
  • definition 3
  • theorem 1