In-Place Updates of a Graph Index for Streaming Approximate Nearest Neighbor Search
Haike Xu, Magdalen Dobson Manohar, Philip A. Bernstein, Badrish Chandramouli, Richard Wen, Harsha Vardhan Simhadri
TL;DR
This work addresses the challenge of streaming ANNS on proximity-graph indices by introducing IP-DiskANN, an in-place deletion algorithm for DiskANN that avoids costly batch consolidations. The method approximates in-neighbors via the insertion visit history and replaces each affected edge with at most c = 3 targeted replacements, yielding practical per-update costs of $O(cR)$ while preserving recall. IP-DiskANN demonstrates stable Recall@10 across diverse, long-running update patterns and outperforms FreshDiskANN and HNSW in both query throughput and update efficiency. The results indicate significant practical benefits for real-time vector search systems requiring continuous updates without expensive rebuilds or resource spikes.
Abstract
Indices for approximate nearest neighbor search (ANNS) are a basic component for information retrieval and widely used in database, search, recommendation and RAG systems. In these scenarios, documents or other objects are inserted into and deleted from the working set at a high rate, requiring a stream of updates to the vector index. Algorithms based on proximity graph indices are the most efficient indices for ANNS, winning many benchmark competitions. However, it is challenging to update such graph index at a high rate, while supporting stable recall after many updates. Since the graph is singly-linked, deletions are hard because there is no fast way to find in-neighbors of a deleted vertex. Therefore, to update the graph, state-of-the-art algorithms such as FreshDiskANN accumulate deletions in a batch and periodically consolidate, removing edges to deleted vertices and modifying the graph to ensure recall stability. In this paper, we present IP-DiskANN (InPlaceUpdate-DiskANN), the first algorithm to avoid batch consolidation by efficiently processing each insertion and deletion in-place. Our experiments using standard benchmarks show that IP-DiskANN has stable recall over various lengthy update patterns in both high-recall and low-recall regimes. Further, its query throughput and update speed are better than using the batch consolidation algorithm and HNSW.
