Table of Contents
Fetching ...

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Seth Ockerman, Amal Gueroudji, Song Young Oh, Robert Underwood, Nicholas Chia, Kyle Chard, Robert Ross, Shivaram Venkataraman

TL;DR

The paper addresses the performance characteristics of distributed vector databases on HPC platforms for scientific workloads. It conducts an empirical evaluation of Qdrant on the Polaris supercomputer using a realistic biological-text workflow and embeddings generated by Qwen-Embedding-4B. Key findings show that embedding generation dominates runtime, data insertion scales but is limited by CPU-bound tasks, index-building is CPU-intensive but not GPU-accelerated in this study, and query speedups from horizontal scaling are modest for the studied dataset size. The work releases embedding and query workloads to enable benchmarking by the community and highlights directions for optimization, including GPU-enabled indexing and broader multisystem evaluations in HPC environments.

Abstract

Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

TL;DR

The paper addresses the performance characteristics of distributed vector databases on HPC platforms for scientific workloads. It conducts an empirical evaluation of Qdrant on the Polaris supercomputer using a realistic biological-text workflow and embeddings generated by Qwen-Embedding-4B. Key findings show that embedding generation dominates runtime, data insertion scales but is limited by CPU-bound tasks, index-building is CPU-intensive but not GPU-accelerated in this study, and query speedups from horizontal scaling are modest for the studied dataset size. The work releases embedding and query workloads to enable benchmarking by the community and highlights directions for optimization, including GPU-enabled indexing and broader multisystem evaluations in HPC environments.

Abstract

Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.

Paper Structure

This paper contains 11 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Two example distributed vector database configurations. Blue boxes represent stateless workers, and green boxes denote the presence of a state.
  • Figure 2: Data insertion time for a 1 GB dataset into a single-worker Qdrant cluster on Polaris using varying batch sizes and parallel requests. The optimal discovered batch size was used while tuning the number of parallel requests.
  • Figure 3: Index build time versus dataset size for varying numbers of Qdrant workers.
  • Figure 4: Query running time for a 1 GB dataset into a single-worker Qdrant cluster on Polaris using varying batch sizes and parallel requests.
  • Figure 5: Query time versus dataset size for varying numbers of Qdrant workers.