Table of Contents
Fetching ...

Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?

Jimmy Lin

TL;DR

The paper addresses the practical question of how to implement nearest-neighbor search for dense and sparse retrieval in retrieval-augmented generation. It conducts an empirical study on the BEIR benchmark using Lucene/Anserini to compare dense HNSW, flat vector indexes, and inverted indexes (SPLADE, BM25), across 29 datasets, focusing on indexing time, query throughput, and retrieval quality. Key findings show that flat indexes perform comparably on small corpora, while HNSW provides major throughput gains for large corpora at some indexing cost and minor quality degradation; quantization offers substantial QPS improvements with modest accuracy impact, and dense vs sparse retrieval are broadly competitive with BM25 remaining fast in some cases. The results yield actionable operational guidance for practitioners building real-world RAG systems, while acknowledging non-determinism in index construction and the need for broader validation across models and platforms.

Abstract

Practitioners working on dense retrieval today face a bewildering number of choices. Beyond selecting the embedding model, another consequential choice is the actual implementation of nearest-neighbor vector search. While best practices recommend HNSW indexes, flat vector indexes with brute-force search represent another viable option, particularly for smaller corpora and for rapid prototyping. In this paper, we provide experimental results on the BEIR dataset using the open-source Lucene search library that explicate the tradeoffs between HNSW and flat indexes (including quantized variants) from the perspectives of indexing time, query evaluation performance, and retrieval quality. With additional comparisons between dense and sparse retrievers, our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers. To our knowledge, we are the first to provide operational advice supported by empirical experiments in this regard.

Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?

TL;DR

The paper addresses the practical question of how to implement nearest-neighbor search for dense and sparse retrieval in retrieval-augmented generation. It conducts an empirical study on the BEIR benchmark using Lucene/Anserini to compare dense HNSW, flat vector indexes, and inverted indexes (SPLADE, BM25), across 29 datasets, focusing on indexing time, query throughput, and retrieval quality. Key findings show that flat indexes perform comparably on small corpora, while HNSW provides major throughput gains for large corpora at some indexing cost and minor quality degradation; quantization offers substantial QPS improvements with modest accuracy impact, and dense vs sparse retrieval are broadly competitive with BM25 remaining fast in some cases. The results yield actionable operational guidance for practitioners building real-world RAG systems, while acknowledging non-determinism in index construction and the need for broader validation across models and platforms.

Abstract

Practitioners working on dense retrieval today face a bewildering number of choices. Beyond selecting the embedding model, another consequential choice is the actual implementation of nearest-neighbor vector search. While best practices recommend HNSW indexes, flat vector indexes with brute-force search represent another viable option, particularly for smaller corpora and for rapid prototyping. In this paper, we provide experimental results on the BEIR dataset using the open-source Lucene search library that explicate the tradeoffs between HNSW and flat indexes (including quantized variants) from the perspectives of indexing time, query evaluation performance, and retrieval quality. With additional comparisons between dense and sparse retrievers, our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers. To our knowledge, we are the first to provide operational advice supported by empirical experiments in this regard.
Paper Structure (11 sections, 4 tables)