VIBE: Vector Index Benchmark for Embeddings
Elias Jääsaari, Ville Hyvönen, Matteo Ceccarello, Teemu Roos, Martin Aumüller
TL;DR
VIBE addresses the need for up-to-date, open benchmarks for vector indexes performing ANN on modern embeddings, including out-of-distribution workloads. It introduces a pipeline that generates benchmark datasets from contemporary embedding models and supports OOD scenarios, quantization, and broad hardware with an accompanying interactive website for analysis. The study benchmarks 21 implementations across 12 in-distribution and 6 out-of-distribution datasets, revealing that graph- and clustering-based indexes deliver the best throughput at high recall, with quantization and GPUs offering substantial throughput gains, while OOD performance remains dataset-dependent. The work provides a practical, extensible framework for rigorous, future-proof evaluation of vector indexes in modern AI pipelines, with clear implications for deploying high-performance ANN systems in retrieval-augmented generation and multimodal search contexts.
Abstract
Approximate nearest neighbor (ANN) search is a performance-critical component of many machine learning pipelines. Rigorous benchmarking is essential for evaluating the performance of vector indexes for ANN search. However, the datasets of the existing benchmarks are no longer representative of the current applications of ANN search. Hence, there is an urgent need for an up-to-date set of benchmarks. To this end, we introduce Vector Index Benchmark for Embeddings (VIBE), an open source project for benchmarking ANN algorithms. VIBE contains a pipeline for creating benchmark datasets using dense embedding models characteristic of modern applications, such as retrieval-augmented generation (RAG). To replicate real-world workloads, we also include out-of-distribution (OOD) datasets where the queries and the corpus are drawn from different distributions. We use VIBE to conduct a comprehensive evaluation of SOTA vector indexes, benchmarking 21 implementations on 12 in-distribution and 6 out-of-distribution datasets.
