Table of Contents
Fetching ...

Multiple Index Merge for Approximate Nearest Neighbor Search

Liuchang Jing, Mingyu Yang, Lei Li, Jianbin Qin, Wei Wang

TL;DR

This work tackles the challenge of performing efficient approximate nearest neighbor search when data must be partitioned into multiple sub-indexes due to memory constraints. It introduces a two-pronged approach: Reverse Neighbor Sliding Merge (RNSM) to speed up pairwise index merging by leveraging cross-index neighbor structure, and Merge Order Selection (MOS) to produce sparse, well-connected merge schedules that minimize redundant work. The framework is shown to deliver substantial speedups (up to several times faster than baselines) while preserving Recall@K and QPS, and to scale to large datasets with many partitions. Collectively, this enables scalable, high-quality multi-index AKNN search in practice, with broad applicability across graph-based indexes like HNSW, NSG, and SSG.

Abstract

Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art search efficiency across various benchmarks. However, their extensive distance computations of high-dimensional vectors lead to slow construction and substantial memory overhead. The limited memory capacity often prevents building the entire index at once when handling large-scale datasets. A common practice is to build multiple sub-indexes separately. However, directly searching on these separated indexes severely compromises search efficiency, as queries cannot leverage cross-graph connections. Therefore, efficient graph index merging is crucial for multi-index searching. In this paper, we focus on efficient two-index merging and the merge order of multiple indexes for AKNN search. To achieve this, we propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency. We further investigate merge order selection (MOS) to reduce the merging cost by eliminating redundant merge operations. Experiments show that our approach yields up to a 5.48$\times$ speedup over existing index merge methods and 9.92$\times$ speedup over index reconstruction, while maintaining expected superior search performance. Moreover, our method scales efficiently to 100 million vectors with 50 partitions, maintaining consistent speedups.

Multiple Index Merge for Approximate Nearest Neighbor Search

TL;DR

This work tackles the challenge of performing efficient approximate nearest neighbor search when data must be partitioned into multiple sub-indexes due to memory constraints. It introduces a two-pronged approach: Reverse Neighbor Sliding Merge (RNSM) to speed up pairwise index merging by leveraging cross-index neighbor structure, and Merge Order Selection (MOS) to produce sparse, well-connected merge schedules that minimize redundant work. The framework is shown to deliver substantial speedups (up to several times faster than baselines) while preserving Recall@K and QPS, and to scale to large datasets with many partitions. Collectively, this enables scalable, high-quality multi-index AKNN search in practice, with broad applicability across graph-based indexes like HNSW, NSG, and SSG.

Abstract

Approximate nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art search efficiency across various benchmarks. However, their extensive distance computations of high-dimensional vectors lead to slow construction and substantial memory overhead. The limited memory capacity often prevents building the entire index at once when handling large-scale datasets. A common practice is to build multiple sub-indexes separately. However, directly searching on these separated indexes severely compromises search efficiency, as queries cannot leverage cross-graph connections. Therefore, efficient graph index merging is crucial for multi-index searching. In this paper, we focus on efficient two-index merging and the merge order of multiple indexes for AKNN search. To achieve this, we propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency. We further investigate merge order selection (MOS) to reduce the merging cost by eliminating redundant merge operations. Experiments show that our approach yields up to a 5.48 speedup over existing index merge methods and 9.92 speedup over index reconstruction, while maintaining expected superior search performance. Moreover, our method scales efficiently to 100 million vectors with 50 partitions, maintaining consistent speedups.
Paper Structure (17 sections, 1 theorem, 2 equations, 19 figures, 3 tables, 3 algorithms)

This paper contains 17 sections, 1 theorem, 2 equations, 19 figures, 3 tables, 3 algorithms.

Key Result

theorem 1

The Minimum Cost Dominating Pivot Selection (DPS) problem is NP-hard.

Figures (19)

  • Figure 1: Separated index search performance. Sub-indexes are constructed on 10 and 50 equal-sized partitions of the DEEP1M dataset, and searching is conducted sequentially on all partitions with result merging. It shows that (i) the separated index has a significant performance gap compared with the complete index, and (ii) this gap widens as the partition number increases.
  • Figure 2: Nearest Neighbors being Critical Factor
  • Figure 3: Search Performance of Different Merge Order
  • Figure 4: Pipeline of Our Multi-Index Merge Framework
  • Figure 5: Verification of Distance Cost Model. In datasets IMAGENET1M and DEEP10M, the cost of sliding from $x_i$ to $x_j$ is linear to the distance when $x_i$ and $x_j$ are close.
  • ...and 14 more figures

Theorems & Definitions (8)

  • definition 1: KNN search
  • Example 1
  • Example 2
  • definition 2: Minimum Cost Dominating Pivot Selection (DPS)
  • theorem 1
  • Example 3
  • definition 3: Minimum Cost Merge Order Selection (MOS)
  • Example 4