Table of Contents
Fetching ...

KBest: Efficient Vector Search on Kunpeng CPU

Kaihao Ma, Meiling Wang, Senkevich Oleg, Zijian Li, Daihao Xue, Dmitriy Malyshev, Yangming Lv, Shihai Xiao, Xiao Yan, Radionov Alexander, Weidi Zeng, Yuanzhan Gao, Zhiyu Zou, Xin Yao, Lin Liu, Junhao Wu, Yiding Liu, Yaoyao Fu, Gongyi Wang, Gong Zhang, Fei Yi, Yingfan Liu

TL;DR

This work targets efficient vector search on ARM Kunpeng CPUs by introducing KBest, a graph-based ANNS library tailored for Kunpeng-920. It combines Kunpeng-specific optimizations (batched 1-to-$B$ SIMD distance computations, software prefetching, memory alignment, and huge pages) with algorithmic improvements (index refinement, 2-hop expansion, graph reordering, early termination, and modular vector quantization). Empirical results on four real-world datasets show KBest on Kunpeng-920 delivering substantial throughput gains over x86 baselines (up to 12.6x) and outperforming state-of-the-art libraries, with ablations confirming the effectiveness of each optimization. The library provides a user-friendly C++/Python API and integrates with Milvus via Knowhere, and it is already deployed in production workloads handling tens of millions of queries daily, illustrating the practical viability of ARM-optimized vector search at scale.

Abstract

Vector search, which returns the vectors most similar to a given query vector from a large vector dataset, underlies many important applications such as search, recommendation, and LLMs. To be economic, vector search needs to be efficient to reduce the resources required by a given query workload. However, existing vector search libraries (e.g., Faiss and DiskANN) are optimized for x86 CPU architectures (i.e., Intel and AMD CPUs) while Huawei Kunpeng CPUs are based on the ARM architecture and competitive in compute power. In this paper, we present KBest as a vector search library tailored for the latest Kunpeng 920 CPUs. To be efficient, KBest incorporates extensive hardware-aware and algorithmic optimizations, which include single-instruction-multiple-data (SIMD) accelerated distance computation, data prefetch, index refinement, early termination, and vector quantization. Experiment results show that KBest outperforms SOTA vector search libraries running on x86 CPUs, and our optimizations can improve the query throughput by over 2x. Currently, KBest serves applications from both our internal business and external enterprise clients with tens of millions of queries on a daily basis.

KBest: Efficient Vector Search on Kunpeng CPU

TL;DR

This work targets efficient vector search on ARM Kunpeng CPUs by introducing KBest, a graph-based ANNS library tailored for Kunpeng-920. It combines Kunpeng-specific optimizations (batched 1-to- SIMD distance computations, software prefetching, memory alignment, and huge pages) with algorithmic improvements (index refinement, 2-hop expansion, graph reordering, early termination, and modular vector quantization). Empirical results on four real-world datasets show KBest on Kunpeng-920 delivering substantial throughput gains over x86 baselines (up to 12.6x) and outperforming state-of-the-art libraries, with ablations confirming the effectiveness of each optimization. The library provides a user-friendly C++/Python API and integrates with Milvus via Knowhere, and it is already deployed in production workloads handling tens of millions of queries daily, illustrating the practical viability of ARM-optimized vector search at scale.

Abstract

Vector search, which returns the vectors most similar to a given query vector from a large vector dataset, underlies many important applications such as search, recommendation, and LLMs. To be economic, vector search needs to be efficient to reduce the resources required by a given query workload. However, existing vector search libraries (e.g., Faiss and DiskANN) are optimized for x86 CPU architectures (i.e., Intel and AMD CPUs) while Huawei Kunpeng CPUs are based on the ARM architecture and competitive in compute power. In this paper, we present KBest as a vector search library tailored for the latest Kunpeng 920 CPUs. To be efficient, KBest incorporates extensive hardware-aware and algorithmic optimizations, which include single-instruction-multiple-data (SIMD) accelerated distance computation, data prefetch, index refinement, early termination, and vector quantization. Experiment results show that KBest outperforms SOTA vector search libraries running on x86 CPUs, and our optimizations can improve the query throughput by over 2x. Currently, KBest serves applications from both our internal business and external enterprise clients with tens of millions of queries on a daily basis.

Paper Structure

This paper contains 16 sections, 4 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Vector embeddings and applications of vector search
  • Figure 2: An illustration of vector search on proximity graph indexes, which traverses the graph to identify neighbors
  • Figure 3: The memory layout and workflow of KBest
  • Figure 4: The abstraction of SIMD accelerated operators of fused 1-to-$B$ distance computation
  • Figure 5: The workflow of KBest's prefetch strategy
  • ...and 2 more figures

Theorems & Definitions (1)

  • definition 1