KBest: Efficient Vector Search on Kunpeng CPU
Kaihao Ma, Meiling Wang, Senkevich Oleg, Zijian Li, Daihao Xue, Dmitriy Malyshev, Yangming Lv, Shihai Xiao, Xiao Yan, Radionov Alexander, Weidi Zeng, Yuanzhan Gao, Zhiyu Zou, Xin Yao, Lin Liu, Junhao Wu, Yiding Liu, Yaoyao Fu, Gongyi Wang, Gong Zhang, Fei Yi, Yingfan Liu
TL;DR
This work targets efficient vector search on ARM Kunpeng CPUs by introducing KBest, a graph-based ANNS library tailored for Kunpeng-920. It combines Kunpeng-specific optimizations (batched 1-to-$B$ SIMD distance computations, software prefetching, memory alignment, and huge pages) with algorithmic improvements (index refinement, 2-hop expansion, graph reordering, early termination, and modular vector quantization). Empirical results on four real-world datasets show KBest on Kunpeng-920 delivering substantial throughput gains over x86 baselines (up to 12.6x) and outperforming state-of-the-art libraries, with ablations confirming the effectiveness of each optimization. The library provides a user-friendly C++/Python API and integrates with Milvus via Knowhere, and it is already deployed in production workloads handling tens of millions of queries daily, illustrating the practical viability of ARM-optimized vector search at scale.
Abstract
Vector search, which returns the vectors most similar to a given query vector from a large vector dataset, underlies many important applications such as search, recommendation, and LLMs. To be economic, vector search needs to be efficient to reduce the resources required by a given query workload. However, existing vector search libraries (e.g., Faiss and DiskANN) are optimized for x86 CPU architectures (i.e., Intel and AMD CPUs) while Huawei Kunpeng CPUs are based on the ARM architecture and competitive in compute power. In this paper, we present KBest as a vector search library tailored for the latest Kunpeng 920 CPUs. To be efficient, KBest incorporates extensive hardware-aware and algorithmic optimizations, which include single-instruction-multiple-data (SIMD) accelerated distance computation, data prefetch, index refinement, early termination, and vector quantization. Experiment results show that KBest outperforms SOTA vector search libraries running on x86 CPUs, and our optimizations can improve the query throughput by over 2x. Currently, KBest serves applications from both our internal business and external enterprise clients with tens of millions of queries on a daily basis.
