AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Kento Tatsuno; Daisuke Miyashita; Taiga Ikeda; Kiyoshi Ishiyama; Kazunari Sumiyoshi; Jun Deguchi

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Kento Tatsuno, Daisuke Miyashita, Taiga Ikeda, Kiyoshi Ishiyama, Kazunari Sumiyoshi, Jun Deguchi

TL;DR

This work tackles the memory bottleneck in graph-based ANNS for billion-scale vector data by introducing AiSAQ, All-in-Storage ANNS with Product Quantization, which offloads PQ vectors to SSD and reduces DRAM usage to near zero while maintaining high recall. The core idea is to place PQ vectors within node chunks and fetch them from storage per search hop, keeping only a small cache of PQ centroids in memory; this yields near-zero RAM footprint, constant-time index loading, and sub-millisecond index-switch times, even across multiple billion-scale indices. The paper demonstrates, via experiments on SIFT1M, SIFT1B, and KILT E5 22M, that AiSAQ attains memory usage around $11$–$14$ MB with millisecond query latency and retains recall at 1 comparable to DiskANN, while enabling scalable multi-server deployments with cost advantages. These properties make AiSAQ particularly attractive for retrieval-augmented generation (RAG) pipelines and other large-scale, multi-source information retrieval tasks that require rapid index switching and reduced memory footprints.

Abstract

Graph-based approximate nearest neighbor search (ANNS) algorithms work effectively against large-scale vector retrieval. Among such methods, DiskANN achieves good recall-speed tradeoffs using both DRAM and storage. DiskANN adopts product quantization (PQ) to reduce memory usage, which is still proportional to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index. Our method achieves $\sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

TL;DR

–

MB with millisecond query latency and retains recall at 1 comparable to DiskANN, while enabling scalable multi-server deployments with cost advantages. These properties make AiSAQ particularly attractive for retrieval-augmented generation (RAG) pipelines and other large-scale, multi-source information retrieval tasks that require rapid index switching and reduced memory footprints.

Abstract

10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.

Paper Structure (16 sections, 10 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 10 figures, 5 tables, 1 algorithm.

Introduction
Preliminaries
Graph-Based ANNS Algorithms
Index Switch for Multiple Sources
DiskANN
Drawbacks of DiskANN
Proposed Method
Methodology
Implementation
Evaluation
Datasets and Experimental Conditions
Memory Usage
Query Search Time
Index Switch
Cost Analysis and Multiple-Server System
...and 1 more sections

Figures (10)

Figure 1: Node chunk details and alignment in LBA blocks
Figure 2: Data placements of a node chunk and memory of DiskANN (left) and proposed method AiSAQ (right)
Figure 3: SIFT1M
Figure 4: SIFT1B
Figure 5: KILT E5 22M
...and 5 more figures

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

TL;DR

Abstract

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (10)