Table of Contents
Fetching ...

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Kento Tatsuno, Daisuke Miyashita, Taiga Ikeda, Kiyoshi Ishiyama, Kazunari Sumiyoshi, Jun Deguchi

TL;DR

This work tackles the memory bottleneck in graph-based ANNS for billion-scale vector data by introducing AiSAQ, All-in-Storage ANNS with Product Quantization, which offloads PQ vectors to SSD and reduces DRAM usage to near zero while maintaining high recall. The core idea is to place PQ vectors within node chunks and fetch them from storage per search hop, keeping only a small cache of PQ centroids in memory; this yields near-zero RAM footprint, constant-time index loading, and sub-millisecond index-switch times, even across multiple billion-scale indices. The paper demonstrates, via experiments on SIFT1M, SIFT1B, and KILT E5 22M, that AiSAQ attains memory usage around $11$–$14$ MB with millisecond query latency and retains recall at 1 comparable to DiskANN, while enabling scalable multi-server deployments with cost advantages. These properties make AiSAQ particularly attractive for retrieval-augmented generation (RAG) pipelines and other large-scale, multi-source information retrieval tasks that require rapid index switching and reduced memory footprints.

Abstract

Graph-based approximate nearest neighbor search (ANNS) algorithms work effectively against large-scale vector retrieval. Among such methods, DiskANN achieves good recall-speed tradeoffs using both DRAM and storage. DiskANN adopts product quantization (PQ) to reduce memory usage, which is still proportional to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index. Our method achieves $\sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

TL;DR

This work tackles the memory bottleneck in graph-based ANNS for billion-scale vector data by introducing AiSAQ, All-in-Storage ANNS with Product Quantization, which offloads PQ vectors to SSD and reduces DRAM usage to near zero while maintaining high recall. The core idea is to place PQ vectors within node chunks and fetch them from storage per search hop, keeping only a small cache of PQ centroids in memory; this yields near-zero RAM footprint, constant-time index loading, and sub-millisecond index-switch times, even across multiple billion-scale indices. The paper demonstrates, via experiments on SIFT1M, SIFT1B, and KILT E5 22M, that AiSAQ attains memory usage around MB with millisecond query latency and retains recall at 1 comparable to DiskANN, while enabling scalable multi-server deployments with cost advantages. These properties make AiSAQ particularly attractive for retrieval-augmented generation (RAG) pipelines and other large-scale, multi-source information retrieval tasks that require rapid index switching and reduced memory footprints.

Abstract

Graph-based approximate nearest neighbor search (ANNS) algorithms work effectively against large-scale vector retrieval. Among such methods, DiskANN achieves good recall-speed tradeoffs using both DRAM and storage. DiskANN adopts product quantization (PQ) to reduce memory usage, which is still proportional to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index. Our method achieves 10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.
Paper Structure (16 sections, 10 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Node chunk details and alignment in LBA blocks
  • Figure 2: Data placements of a node chunk and memory of DiskANN (left) and proposed method AiSAQ (right)
  • Figure 3: SIFT1M
  • Figure 4: SIFT1B
  • Figure 5: KILT E5 22M
  • ...and 5 more figures