Table of Contents
Fetching ...

Accelerating Graph Indexing for ANNS on Modern CPUs

Mengzhao Wang, Haotian Wu, Xiangyu Ke, Yunjun Gao, Yifan Zhu, Wenchao Zhou

TL;DR

This paper tackles the slow index construction of graph-based ANNS, notably HNSW, on modern CPUs by identifying distance computation as the primary bottleneck caused by memory latency and suboptimal SIMD use. It develops Flash, a tailored compact coding strategy with PCA-based subspaces, ADTs for CA, SDTs for NS, and access-aware memory layouts to minimize random accesses and maximize SIMD efficiency. Through extensive experiments on eight real-world datasets, Flash achieves an order-of-magnitude speedup in index construction while maintaining or improving search performance, and proves its generality across SIMD instructions, HNSW variants, and other graph indices. The work offers practical impact for large-scale, dynamically updated vector databases by enabling rapid index rebuilding with minimal performance penalties.

Abstract

In high-dimensional vector spaces, Approximate Nearest Neighbor Search (ANNS) is a key component in database and artificial intelligence infrastructures. Graph-based methods, particularly HNSW, have emerged as leading solutions among various ANNS approaches, offering an impressive trade-off between search efficiency and accuracy. Many modern vector databases utilize graph indexes as their core algorithms, benefiting from various optimizations to enhance search performance. However, the high indexing time associated with graph algorithms poses a significant challenge, especially given the increasing volume of data, query processing complexity, and dynamic index maintenance demand. This has rendered indexing time a critical performance metric for users. In this paper, we comprehensively analyze the underlying causes of the low graph indexing efficiency on modern CPUs, identifying that distance computation dominates indexing time, primarily due to high memory access latency and suboptimal arithmetic operation efficiency. We demonstrate that distance comparisons during index construction can be effectively performed using compact vector codes at an appropriate compression error. Drawing from insights gained through integrating existing compact coding methods in the graph indexing process, we propose a novel compact coding strategy, named Flash, designed explicitly for graph indexing and optimized for modern CPU architectures. By minimizing random memory accesses and maximizing the utilization of SIMD (Single Instruction, Multiple Data) instructions, Flash significantly enhances cache hit rates and arithmetic operations. Extensive experiments conducted on eight real-world datasets, ranging from ten million to one billion vectors, exhibit that Flash achieves a speedup of 10.4$\times$ to 22.9$\times$ in index construction efficiency, while maintaining or improving search performance.

Accelerating Graph Indexing for ANNS on Modern CPUs

TL;DR

This paper tackles the slow index construction of graph-based ANNS, notably HNSW, on modern CPUs by identifying distance computation as the primary bottleneck caused by memory latency and suboptimal SIMD use. It develops Flash, a tailored compact coding strategy with PCA-based subspaces, ADTs for CA, SDTs for NS, and access-aware memory layouts to minimize random accesses and maximize SIMD efficiency. Through extensive experiments on eight real-world datasets, Flash achieves an order-of-magnitude speedup in index construction while maintaining or improving search performance, and proves its generality across SIMD instructions, HNSW variants, and other graph indices. The work offers practical impact for large-scale, dynamically updated vector databases by enabling rapid index rebuilding with minimal performance penalties.

Abstract

In high-dimensional vector spaces, Approximate Nearest Neighbor Search (ANNS) is a key component in database and artificial intelligence infrastructures. Graph-based methods, particularly HNSW, have emerged as leading solutions among various ANNS approaches, offering an impressive trade-off between search efficiency and accuracy. Many modern vector databases utilize graph indexes as their core algorithms, benefiting from various optimizations to enhance search performance. However, the high indexing time associated with graph algorithms poses a significant challenge, especially given the increasing volume of data, query processing complexity, and dynamic index maintenance demand. This has rendered indexing time a critical performance metric for users. In this paper, we comprehensively analyze the underlying causes of the low graph indexing efficiency on modern CPUs, identifying that distance computation dominates indexing time, primarily due to high memory access latency and suboptimal arithmetic operation efficiency. We demonstrate that distance comparisons during index construction can be effectively performed using compact vector codes at an appropriate compression error. Drawing from insights gained through integrating existing compact coding methods in the graph indexing process, we propose a novel compact coding strategy, named Flash, designed explicitly for graph indexing and optimized for modern CPU architectures. By minimizing random memory accesses and maximizing the utilization of SIMD (Single Instruction, Multiple Data) instructions, Flash significantly enhances cache hit rates and arithmetic operations. Extensive experiments conducted on eight real-world datasets, ranging from ten million to one billion vectors, exhibit that Flash achieves a speedup of 10.4 to 22.9 in index construction efficiency, while maintaining or improving search performance.

Paper Structure

This paper contains 44 sections, 2 theorems, 8 equations, 16 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Given any three vertices $\boldsymbol{u}$, $\boldsymbol{v}$, and $\boldsymbol{w}$ in $D$-dimensional Euclidean space $\mathbb{R}^D$, the comparison between $\delta(\boldsymbol{u}, \boldsymbol{v})$ and $\delta(\boldsymbol{u}, \boldsymbol{w})$ is: $\bullet$$\delta(\boldsymbol{u}, \boldsymbol{v}) < \de

Figures (16)

  • Figure 1: Profiling of HNSW indexing time. Distance computation constitutes the majority of the indexing time and consists of two components: memory accesses (B) and arithmetic operations (C). Other tasks (A), such as data structure maintenance, account for a minor portion.
  • Figure 2: Illustrating memory accesses and arithmetic operations during the HNSW construction process (base layer).
  • Figure 3: Effect of parameters on HNSW-PQ.
  • Figure 4: Effect of parameters on HNSW-SQ and HNSW-PCA.
  • Figure 5: Illustrating the coding pipeline and data layout of Flash.
  • ...and 11 more figures

Theorems & Definitions (3)

  • Example 1
  • Lemma 1
  • Theorem 1: LVQ