Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
Durga Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni
TL;DR
Arkade introduces two general reductions, Filter-Refine (FR) and Monotone Transformation (MT), to enable kNN queries with non-Euclidean distances to be accelerated on GPU ray-tracing (RT) cores, which traditionally optimize Euclidean metrics. FR decouples the search into a radius-based filtering step and a refinement step, mapping the process to RT-core BVH traversal and shader-based distance calculations for general $D$, with correctness guarantees. MT handles distance functions whose geometry is not readily represented by RT objects by applying monotone transformations to preserve distance order, enabling an $L^2$-based RT search on transformed data (e.g., cosine distance via normalization). Empirical results on a RTX 4060 Ti show substantial speedups (up to hundreds of times faster) over state-of-the-art baselines for several distances and datasets, while analyses highlight factors like BVH quality, ray-AABB intersections, and dataset distribution as key performance drivers. The work broadens the applicability of RT-core acceleration to low-dimensional, non-Euclidean kNN tasks and offers practical strategies for choosing distances and radii in real-world settings.
Abstract
High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed constraint on the distance metric used in the search -- the Euclidean distance. We propose and implement two reductions to support $k$NN for a broad range of distances other than the Euclidean distance: Arkade Filter-Refine and Arkade Monotone Transformation, each of which allows non-Euclidean distance-based nearest neighbor queries to be performed in terms of the Euclidean distance. With our reductions, we observe that $k$NN search time speedups range between $1.6$x-$200$x and $1.3$x-$33.1$x over various state-of-the-art GPU shader core and RT core baselines, respectively. In evaluation, we provide several insights on RT architectures' ability to efficiently build and traverse the tree by analyzing the $k$NN search time trends.
