Table of Contents
Fetching ...

A Topology-Aware Localized Update Strategy for Graph-Based ANN Index

Song Yu, Shengyuan Lin, Shufeng Gong, Yongqing Xie, Ruicheng Liu, Yijie Zhou, Ji Sun, Yanfeng Zhang, Guoliang Li, Ge Yu

TL;DR

This work tackles the challenge of dynamic, disk-based graph ANNS where small-batch updates degrade throughput due to unnecessary I/O and costly neighbor pruning. It introduces a topology-aware localized update strategy, featuring a lightweight topology for rapid affected-vertex identification, page-level localized updates to minimize I/O, and an adaptive similar neighbor replacement with a relaxed neighbor limit to curb pruning overhead. The resulting Greator system demonstrates 2.47×–6.45× update throughput gains over FreshDiskANN while maintaining comparable recall and tail latency across eight real-world datasets. The approach combines an update-friendly index design with asynchronous I/O and concurrency controls, offering a practical solution for streaming vector data applications that require real-time updates without sacrificing search quality.

Abstract

The graph-based index has been widely adopted to meet the demand for approximate nearest neighbor search (ANNS) for high-dimensional vectors. However, in dynamic scenarios involving frequent vector insertions and deletions, existing systems improve update throughput by adopting a batch update method. However, a large batch size leads to significant degradation in search accuracy. This work aims to improve the performance of graph-based ANNS systems in small-batch update scenarios, while maintaining high search efficiency and accuracy. We identify two key issues in existing batch update systems for small-batch updates. First, the system needs to scan the entire index file to identify and update the affected vertices, resulting in excessive unnecessary I/O. Second, updating the affected vertices introduces many new neighbors, frequently triggering neighbor pruning. To address these issues, we propose a topology-aware localized update strategy for graph-based ANN index. We introduce a lightweight index topology to identify affected vertices efficiently and employ a localized update strategy that modifies only the affected vertices in the index file. To mitigate frequent heavy neighbor pruning, we propose a similar neighbor replacement strategy, which connects the affected vertices to only a small number (typically one) of the most similar outgoing neighbors of the deleted vertex during repair. Based on extensive experiments on real-world datasets, our update strategy achieves 2.47X-6.45X higher update throughput than the state-of-the-art system FreshDiskANN while maintaining high search efficiency and accuracy.

A Topology-Aware Localized Update Strategy for Graph-Based ANN Index

TL;DR

This work tackles the challenge of dynamic, disk-based graph ANNS where small-batch updates degrade throughput due to unnecessary I/O and costly neighbor pruning. It introduces a topology-aware localized update strategy, featuring a lightweight topology for rapid affected-vertex identification, page-level localized updates to minimize I/O, and an adaptive similar neighbor replacement with a relaxed neighbor limit to curb pruning overhead. The resulting Greator system demonstrates 2.47×–6.45× update throughput gains over FreshDiskANN while maintaining comparable recall and tail latency across eight real-world datasets. The approach combines an update-friendly index design with asynchronous I/O and concurrency controls, offering a practical solution for streaming vector data applications that require real-time updates without sacrificing search quality.

Abstract

The graph-based index has been widely adopted to meet the demand for approximate nearest neighbor search (ANNS) for high-dimensional vectors. However, in dynamic scenarios involving frequent vector insertions and deletions, existing systems improve update throughput by adopting a batch update method. However, a large batch size leads to significant degradation in search accuracy. This work aims to improve the performance of graph-based ANNS systems in small-batch update scenarios, while maintaining high search efficiency and accuracy. We identify two key issues in existing batch update systems for small-batch updates. First, the system needs to scan the entire index file to identify and update the affected vertices, resulting in excessive unnecessary I/O. Second, updating the affected vertices introduces many new neighbors, frequently triggering neighbor pruning. To address these issues, we propose a topology-aware localized update strategy for graph-based ANN index. We introduce a lightweight index topology to identify affected vertices efficiently and employ a localized update strategy that modifies only the affected vertices in the index file. To mitigate frequent heavy neighbor pruning, we propose a similar neighbor replacement strategy, which connects the affected vertices to only a small number (typically one) of the most similar outgoing neighbors of the deleted vertex during repair. Based on extensive experiments on real-world datasets, our update strategy achieves 2.47X-6.45X higher update throughput than the state-of-the-art system FreshDiskANN while maintaining high search efficiency and accuracy.

Paper Structure

This paper contains 22 sections, 16 figures, 1 table, 2 algorithms.

Figures (16)

  • Figure 1: The ratio of affected vertices to unaffected vertices in the index.
  • Figure 2: The ratio of disk space occupied by vectors and graph topology in the index.
  • Figure 3: An example of a graph-based vector index and its update process. Each vertex has a maximum neighbor limit of $R=3$. The red star represents the query vector, $v_0$ represents the entry of the index, and $v_6$ ($x_6$) represents the nearest target vertex (vector) for the query.
  • Figure 4: The update workflow of our topology-aware localized update strategy. The crossed-out circles represent deleted vertices, the red circles indicate vertices affected by the deletion, and the green circles denote newly inserted vertices. The red pages (e.g., page 0 and page 2) represent the affected pages that need to be updated.
  • Figure 5: An example of storing reverse edges in $\Delta G$.
  • ...and 11 more figures