FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation
Seongyeon Park, Jaeyong Song, Changmin Shin, Sukjin Kim, Junguk Hong, Jinho Lee
TL;DR
Dynamic random walks resist precomputation due to runtime-dependent transition probabilities, creating a need for workload-generic GPU frameworks. FlexiWalker delivers this through two optimized kernels (eRJS and eRVS), a lightweight per-node cost model for runtime kernel selection, and compile-time workload specialization via Flexi-Compiler. The framework demonstrates substantial performance gains over both CPU and GPU baselines across multiple dynamic workloads and graphs, while maintaining low overhead and multi-GPU scalability. The work is open-sourced to facilitate adoption and future extensions.
Abstract
Dynamic random walks are fundamental to various graph analysis applications, offering advantages by adapting to evolving graph properties. Their runtime-dependent transition probabilities break down the pre-computation strategy that underpins most existing CPU and GPU static random walk optimizations. This leaves practitioners suffering from suboptimal frameworks and having to write hand-tuned kernels that do not adapt to workload diversity. To handle this issue, we present FlexiWalker, the first GPU framework that delivers efficient, workload-generic support for dynamic random walks. Our design-space study shows that rejection sampling and reservoir sampling are more suitable than other sampling techniques under massive parallelism. Thus, we devise (i) new high-performance kernels for them that eliminate global reductions, redundant memory accesses, and random-number generation. Given the necessity of choosing the best-fitting sampling strategy at runtime, we adopt (ii) a lightweight first-order cost model that selects the faster kernel per node at runtime. To enhance usability, we introduce (iii) a compile-time component that automatically specializes user-supplied walk logic into optimized building blocks. On various dynamic random walk workloads with real-world graphs, FlexiWalker outperforms the best published CPU/GPU baselines by geometric means of 73.44x and 5.91x, respectively, while successfully executing workloads that prior systems cannot support. We open-source FlexiWalker in https://github.com/AIS-SNU/FlexiWalker.
