A Topology-Aware Localized Update Strategy for Graph-Based ANN Index
Song Yu, Shengyuan Lin, Shufeng Gong, Yongqing Xie, Ruicheng Liu, Yijie Zhou, Ji Sun, Yanfeng Zhang, Guoliang Li, Ge Yu
TL;DR
This work tackles the challenge of dynamic, disk-based graph ANNS where small-batch updates degrade throughput due to unnecessary I/O and costly neighbor pruning. It introduces a topology-aware localized update strategy, featuring a lightweight topology for rapid affected-vertex identification, page-level localized updates to minimize I/O, and an adaptive similar neighbor replacement with a relaxed neighbor limit to curb pruning overhead. The resulting Greator system demonstrates 2.47×–6.45× update throughput gains over FreshDiskANN while maintaining comparable recall and tail latency across eight real-world datasets. The approach combines an update-friendly index design with asynchronous I/O and concurrency controls, offering a practical solution for streaming vector data applications that require real-time updates without sacrificing search quality.
Abstract
The graph-based index has been widely adopted to meet the demand for approximate nearest neighbor search (ANNS) for high-dimensional vectors. However, in dynamic scenarios involving frequent vector insertions and deletions, existing systems improve update throughput by adopting a batch update method. However, a large batch size leads to significant degradation in search accuracy. This work aims to improve the performance of graph-based ANNS systems in small-batch update scenarios, while maintaining high search efficiency and accuracy. We identify two key issues in existing batch update systems for small-batch updates. First, the system needs to scan the entire index file to identify and update the affected vertices, resulting in excessive unnecessary I/O. Second, updating the affected vertices introduces many new neighbors, frequently triggering neighbor pruning. To address these issues, we propose a topology-aware localized update strategy for graph-based ANN index. We introduce a lightweight index topology to identify affected vertices efficiently and employ a localized update strategy that modifies only the affected vertices in the index file. To mitigate frequent heavy neighbor pruning, we propose a similar neighbor replacement strategy, which connects the affected vertices to only a small number (typically one) of the most similar outgoing neighbors of the deleted vertex during repair. Based on extensive experiments on real-world datasets, our update strategy achieves 2.47X-6.45X higher update throughput than the state-of-the-art system FreshDiskANN while maintaining high search efficiency and accuracy.
