BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search
Huiling Li, Xin Huang, Byron Choi, Jianliang Xu
TL;DR
This paper proposes the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths, and develops a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout.
Abstract
Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. To accelerate search, graph-based indexes rely on proximity graph, where nodes represent vectors and edges guide the traversal toward the target. However, existing graph indexing solutions for disk-based ANNS typically either optimize the storage layout for a given graph or construct the graph independently of the storage layout, thus overlooking their interaction. In this paper, we bridge this gap by proposing the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths. The core idea is to align the graph topology with the data placement by jointly considering both geometric distance and storage layout for edge selection. To address the scalability challenge of BMRNG construction, we further develop a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout. BAMG integrates block-aware edge pruning with a decoupled storage design that separates raw vectors from the graph index, thereby maximizing block utilization and minimizing redundant disk reads. Additionally, we design a multi-layer navigation graph for adaptive and efficient query entry, along with a block-first search algorithm that prioritizes intra-block traversal to fully exploit each disk I/O operation. Extensive experiments on real-world datasets show that BAMG can outperform state-of-the-art methods in search performance.
