AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval
Kento Tatsuno, Daisuke Miyashita, Taiga Ikeda, Kiyoshi Ishiyama, Kazunari Sumiyoshi, Jun Deguchi
TL;DR
This work tackles the memory bottleneck in graph-based ANNS for billion-scale vector data by introducing AiSAQ, All-in-Storage ANNS with Product Quantization, which offloads PQ vectors to SSD and reduces DRAM usage to near zero while maintaining high recall. The core idea is to place PQ vectors within node chunks and fetch them from storage per search hop, keeping only a small cache of PQ centroids in memory; this yields near-zero RAM footprint, constant-time index loading, and sub-millisecond index-switch times, even across multiple billion-scale indices. The paper demonstrates, via experiments on SIFT1M, SIFT1B, and KILT E5 22M, that AiSAQ attains memory usage around $11$–$14$ MB with millisecond query latency and retains recall at 1 comparable to DiskANN, while enabling scalable multi-server deployments with cost advantages. These properties make AiSAQ particularly attractive for retrieval-augmented generation (RAG) pipelines and other large-scale, multi-source information retrieval tasks that require rapid index switching and reduced memory footprints.
Abstract
Graph-based approximate nearest neighbor search (ANNS) algorithms work effectively against large-scale vector retrieval. Among such methods, DiskANN achieves good recall-speed tradeoffs using both DRAM and storage. DiskANN adopts product quantization (PQ) to reduce memory usage, which is still proportional to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index. Our method achieves $\sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.
