Fast and exact fixed-radius neighbor search based on sorting
Xinye Chen, Stefan Güttel
TL;DR
The paper tackles exact fixed-radius nearest neighbor search by introducing SNN, a sorting-based method that achieves exact results with no hyperparameters beyond the search radius. By centering data, projecting onto the first principal component, and sorting points by the resulting score, SNN prunes candidate points and uses a BLAS-enabled, matrix-based formulation to accelerate distance checks. Theoretical analysis links pruning efficiency to data geometry via the singular values, and experiments demonstrate substantial speedups over tree-based methods and brute force, including clear benefits for clustering with DBSCAN. The work shows strong practical impact across synthetic and real-world datasets, with potential for online and GPU-accelerated deployments.
Abstract
Fixed-radius near neighbor search is a fundamental data operation that retrieves all data points within a user-specified distance to a query point. There are efficient algorithms that can provide fast approximate query responses, but they often have a very compute-intensive indexing phase and require careful parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius near neighbor search method, called SNN, that significantly improves over brute force and tree-based methods in terms of index and query time, provably returns exact results, and requires no parameter tuning. SNN exploits a sorting of the data points by their first principal component to prune the query search space. Further speedup is gained from an efficient implementation using high-level Basic Linear Algebra Subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within the DBSCAN clustering algorithm.
