Three Algorithms for Merging Hierarchical Navigable Small World Graphs
Alexander Ponomarenko
TL;DR
The paper addresses the practical problem of merging independently built HNSW graphs for scalable vector similarity search in distributed and incremental settings. It introduces an iterative four-step merge framework and three algorithms—NGM, IGTM, and CGTM—to efficiently fuse two graphs by leveraging intra- and cross-graph locality. Empirical results on SIFT1M show that IGTM and CGTM significantly cut distance computations (up to about 70% fewer) while maintaining comparable recall, with IGTM often delivering the best efficiency. The work enables efficient consolidation and compaction of vector indices in modern vector databases and retrieval systems that rely on HNSW for ANN search, and it identifies directions for handling deletions and accelerator-friendly implementations in future work.
Abstract
This paper addresses the challenge of merging hierarchical navigable small world (HNSW) graphs, a critical operation for distributed systems, incremental indexing, and database compaction. We propose three algorithms for this task: Naive Graph Merge (NGM), Intra Graph Traversal Merge (IGTM), and Cross Graph Traversal Merge (CGTM). These algorithms differ in their approach to vertex selection and candidate collection during the merge process. We conceptualize graph merging as an iterative process with four key steps: processing vertex selection, candidate collection, neighborhood construction, and information propagation. Our experimental evaluation on the SIFT1M dataset demonstrates that IGTM and CGTM significantly reduce computational costs compared to naive approaches, requiring up to 70\% fewer distance computations while maintaining comparable search accuracy. Surprisingly, IGTM outperforms CGTM in efficiency, contrary to our initial expectations. The proposed algorithms enable efficient consolidation of separately constructed indices, supporting critical operations in modern vector databases and retrieval systems that rely on HNSW for similarity search.
