Table of Contents
Fetching ...

Cloud-Native Vector Search: A Comprehensive Performance Analysis

Zhaoheng Li, Wei Ding, Silu Huang, Zikang Wang, Yuanjin Lin, Ke Wu, Yongjoo Park, Jianjun Chen

TL;DR

The paper analyzes cloud-native vector search where vector indexes reside on remote storage and queries fetch index segments, with local caching to mitigate I/O. It contrasts cluster indexes (e.g., SPSpan) and graph indexes (e.g., DiskANN) under cloud I/O constraints, showing graph indexes can outperform cluster indexes in high-concurrency or high-recall workloads, especially on certain data types. It proposes design and caching strategies tailored to cloud storage, revealing that I/O characteristics drive parameter choices (e.g., more centroids, denser graphs, beamwidth) and that caching interactions can both help and complicate optimization. Extensive experiments on four real datasets quantify trade-offs across workload dimensions, providing practical recommendations for index selection, parameter tuning, and caching usage in cloud-native vector search deployments.

Abstract

Vector search has been widely employed in recommender system and retrieval-augmented-generation pipelines, commonly performed with vector indexes to efficiently find similar items in large datasets. Recent growths in both data and task complexity have motivated placing vector indexes onto remote storage -- cloud-native vector search, which cloud providers have recently introduced services for. Yet, despite varying workload characteristics and various available vector index forms, providers default to using cluster-based indexes, which on paper do adapt well to differences between disk and cloud-based environment: their fetch granularities and lack of notable intra-query dependencies aligns with the large optimal fetch sizes and minimizes costly round-trips (i.e., as opposed to graph-based indexes) to remote storage, respectively. This paper systematically studies cloud-native vector search: What and how should indexes be built and used for on-cloud vector search? We analyze bottlenecks of two common index classes, cluster and graph indexes, on remote storage, and show that despite current standardized adoption of cluster indexes on the cloud, graph indexes are favored in workloads requiring high concurrency and recall, or operating on high-dimensional data or large datatypes. We further find that on-cloud search demands significantly different indexing and search parameterizations versus on-disk search for optimal performance. Finally, we incorporate existing cloud-based caching setups into vector search and find that certain index optimizations work against caching, and study how this can be mitigated to maximize gains under various available cache sizes.

Cloud-Native Vector Search: A Comprehensive Performance Analysis

TL;DR

The paper analyzes cloud-native vector search where vector indexes reside on remote storage and queries fetch index segments, with local caching to mitigate I/O. It contrasts cluster indexes (e.g., SPSpan) and graph indexes (e.g., DiskANN) under cloud I/O constraints, showing graph indexes can outperform cluster indexes in high-concurrency or high-recall workloads, especially on certain data types. It proposes design and caching strategies tailored to cloud storage, revealing that I/O characteristics drive parameter choices (e.g., more centroids, denser graphs, beamwidth) and that caching interactions can both help and complicate optimization. Extensive experiments on four real datasets quantify trade-offs across workload dimensions, providing practical recommendations for index selection, parameter tuning, and caching usage in cloud-native vector search deployments.

Abstract

Vector search has been widely employed in recommender system and retrieval-augmented-generation pipelines, commonly performed with vector indexes to efficiently find similar items in large datasets. Recent growths in both data and task complexity have motivated placing vector indexes onto remote storage -- cloud-native vector search, which cloud providers have recently introduced services for. Yet, despite varying workload characteristics and various available vector index forms, providers default to using cluster-based indexes, which on paper do adapt well to differences between disk and cloud-based environment: their fetch granularities and lack of notable intra-query dependencies aligns with the large optimal fetch sizes and minimizes costly round-trips (i.e., as opposed to graph-based indexes) to remote storage, respectively. This paper systematically studies cloud-native vector search: What and how should indexes be built and used for on-cloud vector search? We analyze bottlenecks of two common index classes, cluster and graph indexes, on remote storage, and show that despite current standardized adoption of cluster indexes on the cloud, graph indexes are favored in workloads requiring high concurrency and recall, or operating on high-dimensional data or large datatypes. We further find that on-cloud search demands significantly different indexing and search parameterizations versus on-disk search for optimal performance. Finally, we incorporate existing cloud-based caching setups into vector search and find that certain index optimizations work against caching, and study how this can be mitigated to maximize gains under various available cache sizes.

Paper Structure

This paper contains 61 sections, 2 equations, 25 figures, 4 tables, 1 algorithm.

Figures (25)

  • Figure 1: Cloud-native vector search setup we study in this paper. Compute fetches vector index segments from remote storage and utilizes local resources to cache hot segments.
  • Figure 2: Key overheads of SPANN and DiskANN on GIST1M measured with perflinuxperf. Both indexes' search costs are dominated by I/O, albeit different aspects of it, on remote storage.
  • Figure 3: CPU, I/O usage, and QPS of on-disk vs. on-remote storage querying of SPANN and DiskANN on GIST1M. While search may be bottlenecked by CPU on-disk, bottlenecks are almost always network-related for cloud-native search.
  • Figure 4: Overheads of on-cloud SPANN and DiskANN. Indexes can be tuned to reduce I/O for more computations.
  • Figure 5: SPANN and DiskANN queries on GIST1M with 4GB SLRU cache; setup described in \ref{['sec:exp_setup']}. Queries' latencies are correlated with data read and number of roundtrips, respectively, which cache hits can help reduce.
  • ...and 20 more figures