Distribution-Aware Exploration for Adaptive HNSW Search

Chao Zhang; Renée J. Miller

Distribution-Aware Exploration for Adaptive HNSW Search

Chao Zhang, Renée J. Miller

TL;DR

This work tackles the lack of recall guarantees and inefficiency in HNSW searches caused by a fixed, uniform ef across queries. It introduces Adaptive-ef (Ada-ef), a distribution-aware, per-query runtime mechanism that estimates an appropriate ef from offline dataset statistics and a query's observed distance distribution, aiming to meet a target recall with minimal overhead. The approach rests on a theoretical foundation that FDLs under common similarity metrics approximate Gaussian distributions in high dimensions, enabling efficient online estimation via a precomputed mean vector and covariance. Empirically, Ada-ef delivers up to 4x online latency reductions and dramatic offline time/memory savings compared to state-of-the-art adaptive methods, while robustly handling real-world, skewed embedding spaces and dynamic workloads.

Abstract

Hierarchical Navigable Small World (HNSW) is widely adopted for approximate nearest neighbor search (ANNS) for its ability to deliver high recall with low latency on large-scale, high-dimensional embeddings. The exploration factor, commonly referred to as ef, is a key parameter in HNSW-based vector search that balances accuracy and efficiency. However, existing systems typically rely on manually and statically configured ef values that are uniformly applied across all queries. This results in a distribution-agnostic configuration that fails to account for the non-uniform and skewed nature of real-world embedding data and query workloads. As a consequence, HNSW-based systems suffer from two key practical issues: (i) the absence of recall guarantees, and (ii) inefficient ANNS performance due to over- or under-searching. In this paper, we propose Adaptive-ef (Ada-ef), a data-driven, update-friendly, query-adaptive approach that dynamically configures ef for each query at runtime to approximately meet a declarative target recall with minimal computation. The core of our approach is a theoretically grounded statistical model that captures the similarity distribution between each query and the database vectors. Based on this foundation, we design a query scoring mechanism that distinguishes between queries requiring only small ef and those that need larger ef to meet a target recall, and accordingly assigns an appropriate ef to each query. Experimental results on real-world embeddings produced by state-of-the-art Transformer models from OpenAI and Cohere show that, compared with state-of-the-art learning-based adaptive approaches, our method achieves the target recall while avoiding both over- and under-searching, reducing online query latency by up to 4x, offline computation time by 50x, and offline memory usage by 100x.

Distribution-Aware Exploration for Adaptive HNSW Search

TL;DR

Abstract

Distribution-Aware Exploration for Adaptive HNSW Search

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)