Table of Contents
Fetching ...

NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

Yitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai "Helen" Li, Yiran Chen

TL;DR

NDSearch addresses memory-bound limitations of graph-traversal-based ANNS by co-designing near-data processing within a modified SSD (SEARSSD). It introduces LUNCSR data layout, Vgenerator, Allocator, and SiN LUN-level accelerators, and couples a two-level processing model with static and dynamic scheduling, plus speculative searching, to exploit internal NAND bandwidth. The approach yields substantial gains over CPU, GPU, and prior in-storage designs, with throughput up to 31.7x and energy efficiency improvements up to two orders of magnitude, demonstrated on HNSW and DiskANN benchmarks. This work shows that integrating computation near data can enable large-scale, high-recall ANNS on storage-bound graphs, offering practical impact for vector databases and RAG pipelines.

Abstract

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCH, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSEARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SEARSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSEARCH, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7x, 14.6x, 7.4x 2.9x over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.

NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

TL;DR

NDSearch addresses memory-bound limitations of graph-traversal-based ANNS by co-designing near-data processing within a modified SSD (SEARSSD). It introduces LUNCSR data layout, Vgenerator, Allocator, and SiN LUN-level accelerators, and couples a two-level processing model with static and dynamic scheduling, plus speculative searching, to exploit internal NAND bandwidth. The approach yields substantial gains over CPU, GPU, and prior in-storage designs, with throughput up to 31.7x and energy efficiency improvements up to two orders of magnitude, demonstrated on HNSW and DiskANN benchmarks. This work shows that integrating computation near data can enable large-scale, high-recall ANNS on storage-bound graphs, offering practical impact for vector databases and RAG pipelines.

Abstract

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCH, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSEARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SEARSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSEARCH, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7x, 14.6x, 7.4x 2.9x over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
Paper Structure (33 sections, 1 equation, 19 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 1 equation, 19 figures, 1 table, 1 algorithm.

Figures (19)

  • Figure 3: The search phase of graph-traversal-based ANNS.
  • Figure 4: Page and LUN access pattern of the search phase.
  • Figure 5: (a) Overview of and overall architecture of ; (b) The new graph format - LUNCSR with LUN and BLK array.
  • Figure 6: The inefficient data layout in NDP scenarios.
  • Figure 7: The detailed architecture of (a) Vgenerator and (b) Allocator.
  • ...and 14 more figures