Table of Contents
Fetching ...

GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction

Xiang Li, Qiong Chang, Yun Li, Jun Miyazaki

TL;DR

GRNND tackles the high cost of constructing sparse approximate nearest neighbor graphs by delivering the first GPU-parallel RNN-Descent algorithm. It combines disordered neighbor propagation, warp-level cooperative updates, a fixed-capacity double-buffered pool, and selective reverse-edge sampling to align the iterative refinement with GPU architecture. Empirical results show substantial speedups over both CPU and GPU baselines across multiple datasets and hardware platforms, while preserving high-quality graph structures for efficient querying. The approach enables scalable, end-to-end ANN pipelines and sets the stage for broader deployment in large-scale retrieval systems.

Abstract

Relative Nearest Neighbor Descent (RNN-Descent) is a state-of-the-art algorithm for constructing sparse approximate nearest neighbor (ANN) graphs by combining the iterative refinement of NN-Descent with the edge-pruning rules of the Relative Neighborhood Graph (RNG). It has demonstrated strong effectiveness in large-scale search tasks such as information retrieval and related tasks. However, as the amount and dimensionality of data increase, the complexity of graph construction in RNN-Descent rises sharply, making this stage increasingly time-consuming and even prohibitive for subsequent query processing. In this paper, we propose GRNND, the first GPU-parallel algorithm of RNN-Descent designed to fully exploit GPU architecture. GRNND introduces a disordered neighbor propagation strategy to mitigate synchronized update traps, enhancing structural diversity, and avoiding premature convergence during parallel execution. It also leverages warp-level cooperative operations and a double-buffered neighbor pool with fixed capacity for efficient memory access, eliminate contention, and enable highly parallelized neighbor updates. Extensive experiments demonstrate that GRNND consistently outperforms existing CPU- and GPU-based methods. GRNND achieves 2.4 to 51.7x speedup over existing GPU methods, and 17.8 to 49.8x speedup over CPU methods.

GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction

TL;DR

GRNND tackles the high cost of constructing sparse approximate nearest neighbor graphs by delivering the first GPU-parallel RNN-Descent algorithm. It combines disordered neighbor propagation, warp-level cooperative updates, a fixed-capacity double-buffered pool, and selective reverse-edge sampling to align the iterative refinement with GPU architecture. Empirical results show substantial speedups over both CPU and GPU baselines across multiple datasets and hardware platforms, while preserving high-quality graph structures for efficient querying. The approach enables scalable, end-to-end ANN pipelines and sets the stage for broader deployment in large-scale retrieval systems.

Abstract

Relative Nearest Neighbor Descent (RNN-Descent) is a state-of-the-art algorithm for constructing sparse approximate nearest neighbor (ANN) graphs by combining the iterative refinement of NN-Descent with the edge-pruning rules of the Relative Neighborhood Graph (RNG). It has demonstrated strong effectiveness in large-scale search tasks such as information retrieval and related tasks. However, as the amount and dimensionality of data increase, the complexity of graph construction in RNN-Descent rises sharply, making this stage increasingly time-consuming and even prohibitive for subsequent query processing. In this paper, we propose GRNND, the first GPU-parallel algorithm of RNN-Descent designed to fully exploit GPU architecture. GRNND introduces a disordered neighbor propagation strategy to mitigate synchronized update traps, enhancing structural diversity, and avoiding premature convergence during parallel execution. It also leverages warp-level cooperative operations and a double-buffered neighbor pool with fixed capacity for efficient memory access, eliminate contention, and enable highly parallelized neighbor updates. Extensive experiments demonstrate that GRNND consistently outperforms existing CPU- and GPU-based methods. GRNND achieves 2.4 to 51.7x speedup over existing GPU methods, and 17.8 to 49.8x speedup over CPU methods.

Paper Structure

This paper contains 18 sections, 2 equations, 9 figures, 1 table, 6 algorithms.

Figures (9)

  • Figure 1: Neighbor refinement of vertex $v$ under the RNG criterion. (a) Initial candidate neighbors; (b) RNG-based edge pruning; (c) Refined neighbor set of $v$.
  • Figure 2: Comparison of serial vs. parallel execution in RNN-Descent. While serial scheduling enables gradual global optimization, direct parallelism with sorted updates breaks propagation order and leads to premature local convergence. Disordered parallel updates restore exploration capability by injecting randomness into the update paths.
  • Figure 3: Warp-level distance computation: a single warp loads sub-vectors from two aligned inputs, computes squared differences in parallel, and performs warp-wide reduction.
  • Figure 4: Efficient warp-level deduplication for candidate insertion using ballot function. (a) Insertion Allowed: $V_2$ not found in $V_1$’s candidate set; (b) Insertion Skipped: $n_{34}$ already exists in $V_1$’s candidate set.
  • Figure 5: Evaluating construction time vs. recall: Fixed search settings with method-specific construction tuning for comparable Recall@10. $^*$On GIST1M, CAGRA is unavailable due to high memory demand; GGNN is omitted due to excessive build time.
  • ...and 4 more figures