pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search

Zheng Li; Guangyi Zeng; Paul Delestrac; Enyi Yao; Simei Yang

pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search

Zheng Li, Guangyi Zeng, Paul Delestrac, Enyi Yao, Simei Yang

TL;DR

pHNSW tackles the inefficiency of HNSW for high-dimensional ANN by introducing PCA-based filtering to reduce dimensionality and by co-designing an accelerator with a custom ISA. The algorithm searches in a low-dimensional space using PCA-filtered candidates, then back-projects the top-$k$ to the original space for exact distances, with layer-specific $k$ values to balance recall and throughput. The hardware design includes a pHNSW processor, optimized off-chip database organization, and dedicated computation units, achieving up to $14.47\times$ and $21.37\times$ QPS gains on DDR4 and HBM respectively, and up to $57.4\%$ energy reduction, compared to the CPU baseline. The results on the SIFT1M dataset using a 65nm RTL implementation suggest the approach scales toward larger datasets (e.g., SIFT1B) and multi-core/PIM extensions as future work.

Abstract

Hierarchical Navigable Small World (HNSW) has demonstrated impressive accuracy and low latency for high-dimensional nearest neighbor searches. However, its high computational demands and irregular, large-volume data access patterns present significant challenges to search efficiency. To address these challenges, we introduce pHNSW, an algorithm-hardware co-optimized solution that accelerates HNSW through Principal Component Analysis (PCA) filtering. On the algorithm side, we apply PCA filtering to reduce the dimensionality of the dataset, thereby lowering the volume of neighbor access and decreasing the computational load for distance calculations. On the hardware side, we design the pHNSW processor with custom instructions to optimize search throughput and energy efficiency. In the experiments, we synthesized the pHNSW processor RTL design with a 65nm technology node and evaluated it using DDR4 and HBM1.0 DRAM standards. The results show that pHNSW boosts Queries per Second (QPS) by 14.47x-21.37x on a CPU and 5.37x-8.46x on a GPU, while reducing energy consumption by up to 57.4% compared to standard HNSW implementation.

pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search

TL;DR

to the original space for exact distances, with layer-specific

values to balance recall and throughput. The hardware design includes a pHNSW processor, optimized off-chip database organization, and dedicated computation units, achieving up to

and

QPS gains on DDR4 and HBM respectively, and up to

energy reduction, compared to the CPU baseline. The results on the SIFT1M dataset using a 65nm RTL implementation suggest the approach scales toward larger datasets (e.g., SIFT1B) and multi-core/PIM extensions as future work.

Abstract

Paper Structure (21 sections, 5 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 5 figures, 3 tables, 1 algorithm.

Introduction
Related Work
pHNSW Algorithm
pHNSW Algorithm
Discussion on Top-k Parameter Selection
pHNSW Processor
Database Organization
pHNSW Processor Design
Controller
Memory Units and Data Access
Computation Units
Dataflow of pHNSW Processor
Experiment
Experiment setup
Simulation
...and 6 more sections

Figures (5)

Figure 1: (a) Hierarchical graph of HNSW; (b) HNSW searches from a list of neighbors in high-dimensional space; (c) pHNSW searches based on PCA filtering in three steps.
Figure 2: Recall@10 and QPS evolutions: (a) Different k(Layer1) with k(Layer0)=16; (b) Different k(Layer0) with k(Layer1)=8.
Figure 3: (a) Off-chip database organization; (b) pHNSW processor design (*Table \ref{['Table:ISA']} lists the functionalities of the components); (c) The full parallel sorting scheme of the kSort.L module, for an example of sorting five data elements.
Figure 4: Area breakdown of pHNSW processor with a total area of $0.739mm^2$.
Figure 5: Normalized energy of a single query search.

pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search

TL;DR

Abstract

pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search

Authors

TL;DR

Abstract

Table of Contents

Figures (5)