GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction
Xiang Li, Qiong Chang, Yun Li, Jun Miyazaki
TL;DR
GRNND tackles the high cost of constructing sparse approximate nearest neighbor graphs by delivering the first GPU-parallel RNN-Descent algorithm. It combines disordered neighbor propagation, warp-level cooperative updates, a fixed-capacity double-buffered pool, and selective reverse-edge sampling to align the iterative refinement with GPU architecture. Empirical results show substantial speedups over both CPU and GPU baselines across multiple datasets and hardware platforms, while preserving high-quality graph structures for efficient querying. The approach enables scalable, end-to-end ANN pipelines and sets the stage for broader deployment in large-scale retrieval systems.
Abstract
Relative Nearest Neighbor Descent (RNN-Descent) is a state-of-the-art algorithm for constructing sparse approximate nearest neighbor (ANN) graphs by combining the iterative refinement of NN-Descent with the edge-pruning rules of the Relative Neighborhood Graph (RNG). It has demonstrated strong effectiveness in large-scale search tasks such as information retrieval and related tasks. However, as the amount and dimensionality of data increase, the complexity of graph construction in RNN-Descent rises sharply, making this stage increasingly time-consuming and even prohibitive for subsequent query processing. In this paper, we propose GRNND, the first GPU-parallel algorithm of RNN-Descent designed to fully exploit GPU architecture. GRNND introduces a disordered neighbor propagation strategy to mitigate synchronized update traps, enhancing structural diversity, and avoiding premature convergence during parallel execution. It also leverages warp-level cooperative operations and a double-buffered neighbor pool with fixed capacity for efficient memory access, eliminate contention, and enable highly parallelized neighbor updates. Extensive experiments demonstrate that GRNND consistently outperforms existing CPU- and GPU-based methods. GRNND achieves 2.4 to 51.7x speedup over existing GPU methods, and 17.8 to 49.8x speedup over CPU methods.
