pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search
Zheng Li, Guangyi Zeng, Paul Delestrac, Enyi Yao, Simei Yang
TL;DR
pHNSW tackles the inefficiency of HNSW for high-dimensional ANN by introducing PCA-based filtering to reduce dimensionality and by co-designing an accelerator with a custom ISA. The algorithm searches in a low-dimensional space using PCA-filtered candidates, then back-projects the top-$k$ to the original space for exact distances, with layer-specific $k$ values to balance recall and throughput. The hardware design includes a pHNSW processor, optimized off-chip database organization, and dedicated computation units, achieving up to $14.47\times$ and $21.37\times$ QPS gains on DDR4 and HBM respectively, and up to $57.4\%$ energy reduction, compared to the CPU baseline. The results on the SIFT1M dataset using a 65nm RTL implementation suggest the approach scales toward larger datasets (e.g., SIFT1B) and multi-core/PIM extensions as future work.
Abstract
Hierarchical Navigable Small World (HNSW) has demonstrated impressive accuracy and low latency for high-dimensional nearest neighbor searches. However, its high computational demands and irregular, large-volume data access patterns present significant challenges to search efficiency. To address these challenges, we introduce pHNSW, an algorithm-hardware co-optimized solution that accelerates HNSW through Principal Component Analysis (PCA) filtering. On the algorithm side, we apply PCA filtering to reduce the dimensionality of the dataset, thereby lowering the volume of neighbor access and decreasing the computational load for distance calculations. On the hardware side, we design the pHNSW processor with custom instructions to optimize search throughput and energy efficiency. In the experiments, we synthesized the pHNSW processor RTL design with a 65nm technology node and evaluated it using DDR4 and HBM1.0 DRAM standards. The results show that pHNSW boosts Queries per Second (QPS) by 14.47x-21.37x on a CPU and 5.37x-8.46x on a GPU, while reducing energy consumption by up to 57.4% compared to standard HNSW implementation.
