Table of Contents
Fetching ...

FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation

Seongyeon Park, Jaeyong Song, Changmin Shin, Sukjin Kim, Junguk Hong, Jinho Lee

TL;DR

Dynamic random walks resist precomputation due to runtime-dependent transition probabilities, creating a need for workload-generic GPU frameworks. FlexiWalker delivers this through two optimized kernels (eRJS and eRVS), a lightweight per-node cost model for runtime kernel selection, and compile-time workload specialization via Flexi-Compiler. The framework demonstrates substantial performance gains over both CPU and GPU baselines across multiple dynamic workloads and graphs, while maintaining low overhead and multi-GPU scalability. The work is open-sourced to facilitate adoption and future extensions.

Abstract

Dynamic random walks are fundamental to various graph analysis applications, offering advantages by adapting to evolving graph properties. Their runtime-dependent transition probabilities break down the pre-computation strategy that underpins most existing CPU and GPU static random walk optimizations. This leaves practitioners suffering from suboptimal frameworks and having to write hand-tuned kernels that do not adapt to workload diversity. To handle this issue, we present FlexiWalker, the first GPU framework that delivers efficient, workload-generic support for dynamic random walks. Our design-space study shows that rejection sampling and reservoir sampling are more suitable than other sampling techniques under massive parallelism. Thus, we devise (i) new high-performance kernels for them that eliminate global reductions, redundant memory accesses, and random-number generation. Given the necessity of choosing the best-fitting sampling strategy at runtime, we adopt (ii) a lightweight first-order cost model that selects the faster kernel per node at runtime. To enhance usability, we introduce (iii) a compile-time component that automatically specializes user-supplied walk logic into optimized building blocks. On various dynamic random walk workloads with real-world graphs, FlexiWalker outperforms the best published CPU/GPU baselines by geometric means of 73.44x and 5.91x, respectively, while successfully executing workloads that prior systems cannot support. We open-source FlexiWalker in https://github.com/AIS-SNU/FlexiWalker.

FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation

TL;DR

Dynamic random walks resist precomputation due to runtime-dependent transition probabilities, creating a need for workload-generic GPU frameworks. FlexiWalker delivers this through two optimized kernels (eRJS and eRVS), a lightweight per-node cost model for runtime kernel selection, and compile-time workload specialization via Flexi-Compiler. The framework demonstrates substantial performance gains over both CPU and GPU baselines across multiple dynamic workloads and graphs, while maintaining low overhead and multi-GPU scalability. The work is open-sourced to facilitate adoption and future extensions.

Abstract

Dynamic random walks are fundamental to various graph analysis applications, offering advantages by adapting to evolving graph properties. Their runtime-dependent transition probabilities break down the pre-computation strategy that underpins most existing CPU and GPU static random walk optimizations. This leaves practitioners suffering from suboptimal frameworks and having to write hand-tuned kernels that do not adapt to workload diversity. To handle this issue, we present FlexiWalker, the first GPU framework that delivers efficient, workload-generic support for dynamic random walks. Our design-space study shows that rejection sampling and reservoir sampling are more suitable than other sampling techniques under massive parallelism. Thus, we devise (i) new high-performance kernels for them that eliminate global reductions, redundant memory accesses, and random-number generation. Given the necessity of choosing the best-fitting sampling strategy at runtime, we adopt (ii) a lightweight first-order cost model that selects the faster kernel per node at runtime. To enhance usability, we introduce (iii) a compile-time component that automatically specializes user-supplied walk logic into optimized building blocks. On various dynamic random walk workloads with real-world graphs, FlexiWalker outperforms the best published CPU/GPU baselines by geometric means of 73.44x and 5.91x, respectively, while successfully executing workloads that prior systems cannot support. We open-source FlexiWalker in https://github.com/AIS-SNU/FlexiWalker.

Paper Structure

This paper contains 30 sections, 12 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Single step of static and dynamic random walks.
  • Figure 2: An example of a random walk step and how different sampling methods select the next target node.
  • Figure 3: Performance comparison across various sampling methods. Execution time for (a) unweighted Node2Vec and (b) weighted Node2Vec is normalized to ITS (C-SAW).
  • Figure 4: Proposed optimizations of eRVS.
  • Figure 5: Optimizing rejection sampling with eRJS.
  • ...and 11 more figures