NysX: An Accurate and Energy-Efficient FPGA Accelerator for Hyperdimensional Graph Classification at the Edge
Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna
TL;DR
The paper tackles real-time, energy-efficient graph classification on edge devices by leveraging Nyström-encoded Hyperdimensional Computing (HDC) within an end-to-end FPGA accelerator. It introduces NysX, which integrates four key innovations—DPP-based landmark sampling, a streaming Nyström projection, a minimal-perfect-hash key-to-index engine, and static load-balanced SpMV—to overcome memory bandwidth, irregular sparsity, and lookup bottlenecks. The design achieves substantial speedups and energy savings over CPU and GPU baselines while improving accuracy on the TU dataset suite. These results demonstrate the practicality of edge-enabled Nyström-HDC for graph analytics and point to broader applicability across modalities and graph tasks.
Abstract
Real-time, energy-efficient inference on edge devices is essential for graph classification across a range of applications. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that encodes input features into low-precision, high-dimensional vectors with simple element-wise operations, making it well-suited for resource-constrained edge platforms. Recent work enhances HDC accuracy for graph classification via Nyström kernel approximations. Edge acceleration of such methods faces several challenges: (i) redundancy among (landmark) samples selected via uniform sampling, (ii) storing the Nyström projection matrix under limited on-chip memory, (iii) expensive, contention-prone codebook lookups, and (iv) load imbalance due to irregular sparsity in SpMV. To address these challenges, we propose NysX, the first end-to-end FPGA accelerator for Nyström-based HDC graph classification at the edge. NysX integrates four key optimizations: (i) a hybrid landmark selection strategy combining uniform sampling with determinantal point processes (DPPs) to reduce redundancy while improving accuracy; (ii) a streaming architecture for Nyström projection matrix maximizing external memory bandwidth utilization; (iii) a minimal-perfect-hash lookup engine enabling $O(1)$ key-to-index mapping with low on-chip memory overhead; and (iv) sparsity-aware SpMV engines with static load balancing. Together, these innovations enable real-time, energy-efficient inference on resource-constrained platforms. Implemented on an AMD Zynq UltraScale+ (ZCU104) FPGA, NysX achieves $6.85\times$ ($4.32\times$) speedup and $169\times$ ($314\times$) energy efficiency gains over optimized CPU (GPU) baselines, while improving classification accuracy by $3.4\%$ on average across TUDataset benchmarks, a widely used standard for graph classification.
