Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
Nina Mishra, Yonatan Naamad, Tal Wagner, Lichen Zhang
TL;DR
The paper addresses the challenge of deleting points from graph-based ANN indexes like HNSW without sacrificing performance. It introduces SPatch, a deletion procedure grounded in a random-walk framework that preserves hitting-time statistics via a star-mesh transform and a sparsified clique over the deleted point’s neighborhood, rendered deterministically by top-edge selection. Through extensive mass-deletion experiments, SPatch demonstrates strong recall, fast deletions, low query latency, and reduced memory usage compared to existing approaches. The work also shows that a softmax-based random walk closely mirrors greedy search, validating the theoretical model and offering a new lens for analyzing and improving dynamic graph-based ANN systems.
Abstract
Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a deterministic deletion algorithm, and show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.
