Table of Contents
Fetching ...

BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search

Huiling Li, Xin Huang, Byron Choi, Jianliang Xu

TL;DR

This paper proposes the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths, and develops a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout.

Abstract

Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. To accelerate search, graph-based indexes rely on proximity graph, where nodes represent vectors and edges guide the traversal toward the target. However, existing graph indexing solutions for disk-based ANNS typically either optimize the storage layout for a given graph or construct the graph independently of the storage layout, thus overlooking their interaction. In this paper, we bridge this gap by proposing the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths. The core idea is to align the graph topology with the data placement by jointly considering both geometric distance and storage layout for edge selection. To address the scalability challenge of BMRNG construction, we further develop a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout. BAMG integrates block-aware edge pruning with a decoupled storage design that separates raw vectors from the graph index, thereby maximizing block utilization and minimizing redundant disk reads. Additionally, we design a multi-layer navigation graph for adaptive and efficient query entry, along with a block-first search algorithm that prioritizes intra-block traversal to fully exploit each disk I/O operation. Extensive experiments on real-world datasets show that BAMG can outperform state-of-the-art methods in search performance.

BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search

TL;DR

This paper proposes the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths, and develops a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout.

Abstract

Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. To accelerate search, graph-based indexes rely on proximity graph, where nodes represent vectors and edges guide the traversal toward the target. However, existing graph indexing solutions for disk-based ANNS typically either optimize the storage layout for a given graph or construct the graph independently of the storage layout, thus overlooking their interaction. In this paper, we bridge this gap by proposing the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths. The core idea is to align the graph topology with the data placement by jointly considering both geometric distance and storage layout for edge selection. To address the scalability challenge of BMRNG construction, we further develop a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout. BAMG integrates block-aware edge pruning with a decoupled storage design that separates raw vectors from the graph index, thereby maximizing block utilization and minimizing redundant disk reads. Additionally, we design a multi-layer navigation graph for adaptive and efficient query entry, along with a block-first search algorithm that prioritizes intra-block traversal to fully exploit each disk I/O operation. Extensive experiments on real-world datasets show that BAMG can outperform state-of-the-art methods in search performance.

Paper Structure

This paper contains 38 sections, 2 theorems, 5 equations, 16 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Given a set $V$ of $n$ nodes and the block assignment $\mathcal{B}=(V, \mathcal{L})$, let $G$ be the proximity graph satisfying Rule 1 and Rule 2; then $G$ is a BMRNG.

Figures (16)

  • Figure 1: An example illustrating the block-aware graph.
  • Figure 2: Breakdown of search latency.
  • Figure 3: Proportion of Intra- and Inter-Block Edges in Starling.
  • Figure 4: An example of the building process of BAMG.
  • Figure 5: Layout of graph index and raw vectors.
  • ...and 11 more figures

Theorems & Definitions (8)

  • Definition 1: Approximate Nearest Neighbor Search
  • Definition 2: Block Assignment
  • Definition 3: Monotonic I/O Path
  • Definition 4: Block-aware Monotonic Relative Neighborhood Graph
  • Theorem 1
  • proof
  • Theorem 2
  • proof