Table of Contents
Fetching ...

Disk-Resident Graph ANN Search: An Experimental Evaluation

Xiaoyu Chen, Jinxiu Qu, Yitong Song, Shuhang Lu, Huiling Li, Minghui Jiang, Wei Zhou, Jianliang Xu, Xuanhe Zhou, Fan Wu

TL;DR

This study decomposes disk-resident graph-based approximate nearest neighbor systems into five key technical components, builds a unified taxonomy of existing designs across these components, and reveals several non-obvious findings.

Abstract

As data volumes grow while memory capacity remains limited, disk-resident graph-based approximate nearest neighbor (ANN) methods have become a practical alternative to memory-resident designs, shifting the bottleneck from computation to disk I/O. However, since their technical designs diverge widely across storage, layout, and execution paradigms, a systematic understanding of their fundamental performance trade-offs remains elusive. This paper presents a comprehensive experimental study of disk-resident graph-based ANN methods. First, we decompose such systems into five key technical components, i.e., storage strategy, disk layout, cache management, query execution, and update mechanism, and build a unified taxonomy of existing designs across these components. Second, we conduct fine-grained evaluations of representative strategies for each technical component to analyze the trade-offs in throughput, recall, and resource utilization. Third, we perform comprehensive end-to-end experiments and parameter-sensitivity analyses to evaluate overall system performance under diverse configurations. Fourth, our study reveals several non-obvious findings: (1) vector dimensionality fundamentally reshapes component effectiveness, necessitating dimension-aware design; (2) existing layout strategies exhibit surprisingly low I/O utilization (less than or equal to 15%); (3) page size critically affects feasibility and efficiency, with smaller pages preferred when layouts are carefully optimized; and (4) update strategies present clear workload-dependent trade-offs between in-place and out-of-place designs. Based on these findings, we derive practical guidelines for system design and configuration, and outline promising directions for future research.

Disk-Resident Graph ANN Search: An Experimental Evaluation

TL;DR

This study decomposes disk-resident graph-based approximate nearest neighbor systems into five key technical components, builds a unified taxonomy of existing designs across these components, and reveals several non-obvious findings.

Abstract

As data volumes grow while memory capacity remains limited, disk-resident graph-based approximate nearest neighbor (ANN) methods have become a practical alternative to memory-resident designs, shifting the bottleneck from computation to disk I/O. However, since their technical designs diverge widely across storage, layout, and execution paradigms, a systematic understanding of their fundamental performance trade-offs remains elusive. This paper presents a comprehensive experimental study of disk-resident graph-based ANN methods. First, we decompose such systems into five key technical components, i.e., storage strategy, disk layout, cache management, query execution, and update mechanism, and build a unified taxonomy of existing designs across these components. Second, we conduct fine-grained evaluations of representative strategies for each technical component to analyze the trade-offs in throughput, recall, and resource utilization. Third, we perform comprehensive end-to-end experiments and parameter-sensitivity analyses to evaluate overall system performance under diverse configurations. Fourth, our study reveals several non-obvious findings: (1) vector dimensionality fundamentally reshapes component effectiveness, necessitating dimension-aware design; (2) existing layout strategies exhibit surprisingly low I/O utilization (less than or equal to 15%); (3) page size critically affects feasibility and efficiency, with smaller pages preferred when layouts are carefully optimized; and (4) update strategies present clear workload-dependent trade-offs between in-place and out-of-place designs. Based on these findings, we derive practical guidelines for system design and configuration, and outline promising directions for future research.
Paper Structure (29 sections, 4 equations, 17 figures, 4 tables)

This paper contains 29 sections, 4 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Memory-resident vs. disk-resident ANN search.
  • Figure 2: Technology decomposition and experimental study.
  • Figure 3: Disk-resident graph-based ANN search.
  • Figure 4: Locality-aware disk layout strategies.
  • Figure 5: Cache strategies.
  • ...and 12 more figures