Table of Contents
Fetching ...

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Yu. A. Malkov, D. A. Yashunin

TL;DR

The proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches and similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Abstract

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

TL;DR

The proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches and similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Abstract

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Paper Structure

This paper contains 19 sections, 1 equation, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Illustration of the Hierarchical NSW idea. The search starts from an element from the top layer (shown red). Red arrows show direction of the greedy algorithm from the entry point to the query (shown green).
  • Figure 2: Illustration of the heuristic used to select the graph neighbors for two isolated clusters. A new element is inserted on the boundary of Cluster 1. All of the closest neighbors of the element belong to the Cluster 1, thus missing the edges of Delaunay graph between the clusters. The heuristic, however, selects element$e_{2}$ from Cluster 2, thus, maintaining the global connectivity in case the inserted element is the closest to $e_{2}$ compared to any other element from Cluster 1.
  • Figure 3: Plots for query time vs$m_{L}$ parameter for 10 M random vectors with $\mathrm{d}=4$. The autoselected value $1 / \ln (M)$ for $m_{L}$ is shown by an arrow.
  • Figure 4: Plots for query time vs$m_{L}$ parameter for 100 k random vectors with $\mathrm{d}=1024$. The autoselected value $1 / \ln (M)$ for $m_{L}$ is shown by an arrow.
  • Figure 5: Plots for query time vs$m_{L}$ parameter for 5 M SIFT learn dataset. The autoselected value $1 / \ln (M)$ for $m_{L}$ is shown by an arrow.
  • ...and 10 more figures