Table of Contents
Fetching ...

FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing

Aarush Agarwal, Raymond He, Jan Kieseler, Matteo Cremonesi, Shah Rukh Qasim

TL;DR

FastGraph targets the bottleneck of graph construction in graph neural networks by delivering a differentiable, GPU-resident $k$NN mechanism optimized for low-dimensional embeddings ($d$ in $2$ to $10$). It uses a bin-partitioned search with adaptive tuning, static allocation, and gradient flow, packaged as an open-source PyTorch extension; it reports up to ~40x latency reduction over the FAISS exact flat index and substantial speedups versus other baselines in the typical GNN regime. The paper demonstrates strong empirical performance on NVIDIA GPUs, highlights integration with GravNet-based dynamic graphs and the object condensation framework, and argues that the approach enables deeper, end-to-end trainable graph construction with minimal memory overhead. Overall, FastGraph fills a gap for differentiable, low-d, GPU-accelerated kNN in scientific GNN workflows, with practical impact in particle physics, object tracking, and graph clustering.

Abstract

We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-performance graph neural networks. Our method employs a GPU-resident, bin-partitioned approach with full gradient-flow support and adaptive parameter tuning, significantly enhancing both computational and memory efficiency. Benchmarking demonstrates that FastGraph achieves a 20-40x speedup over state-of-the-art libraries such as FAISS, ANNOY, and SCANN in dimensions less than 10 with virtually no memory overhead. These improvements directly translate into substantial performance gains for GNN-based workflows, particularly benefiting computationally intensive applications in low dimensions such as particle clustering in high-energy physics, visual object tracking, and graph clustering.

FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing

TL;DR

FastGraph targets the bottleneck of graph construction in graph neural networks by delivering a differentiable, GPU-resident NN mechanism optimized for low-dimensional embeddings ( in to ). It uses a bin-partitioned search with adaptive tuning, static allocation, and gradient flow, packaged as an open-source PyTorch extension; it reports up to ~40x latency reduction over the FAISS exact flat index and substantial speedups versus other baselines in the typical GNN regime. The paper demonstrates strong empirical performance on NVIDIA GPUs, highlights integration with GravNet-based dynamic graphs and the object condensation framework, and argues that the approach enables deeper, end-to-end trainable graph construction with minimal memory overhead. Overall, FastGraph fills a gap for differentiable, low-d, GPU-accelerated kNN in scientific GNN workflows, with practical impact in particle physics, object tracking, and graph clustering.

Abstract

We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-performance graph neural networks. Our method employs a GPU-resident, bin-partitioned approach with full gradient-flow support and adaptive parameter tuning, significantly enhancing both computational and memory efficiency. Benchmarking demonstrates that FastGraph achieves a 20-40x speedup over state-of-the-art libraries such as FAISS, ANNOY, and SCANN in dimensions less than 10 with virtually no memory overhead. These improvements directly translate into substantial performance gains for GNN-based workflows, particularly benefiting computationally intensive applications in low dimensions such as particle clustering in high-energy physics, visual object tracking, and graph clustering.

Paper Structure

This paper contains 7 sections, 2 equations, 3 figures, 3 algorithms.

Figures (3)

  • Figure 1: View of performance across dimensions at K=40 and dataset size = 1M.
  • Figure 2: Performance with dataset scaling for $d=3$ at $k=10$.
  • Figure 3: Performance with dataset scaling for $d=3$ at $k=10$.