D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks
Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu
TL;DR
This work addresses the challenge of real-time GNN inference and training on streaming graphs under online-query settings. It introduces D3-GNN, a distributed, hybrid-parallel dataflow system built on Apache Flink that unrolls GNN computations into per-layer operators and uses streaming aggregators to maintain up-to-date node embeddings with fault-tolerance. Key contributions include incremental aggregation, windowed forward passes (intra- and inter-layer), an explosion factor for tunable parallelism, a Training Coordinator for stale-free distributed training, streaming graph partitioning with scalable re-scaling, and extensive evaluation showing up to 76x throughput gains over DGL and substantial reductions in runtime and communication with windowing. The approach enables near real-time inference on streaming graphs, supports synchronous distributed training without a separate environment, and offers a practical, scalable solution for latency-sensitive graph learning tasks.
Abstract
Graph Neural Network (GNN) models on streaming graphs entail algorithmic challenges to continuously capture its dynamic state, as well as systems challenges to optimize latency, memory, and throughput during both inference and training. We present D3-GNN, the first distributed, hybrid-parallel, streaming GNN system designed to handle real-time graph updates under online query setting. Our system addresses data management, algorithmic, and systems challenges, enabling continuous capturing of the dynamic state of the graph and updating node representations with fault-tolerance and optimal latency, load-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an unrolled, distributed computation graph architecture to handle cascading graph updates. To counteract data skew and neighborhood explosion issues, we introduce inter-layer and intra-layer windowed forward pass solutions. Experiments on large-scale graph streams demonstrate that D3-GNN achieves high efficiency and scalability. Compared to DGL, D3-GNN achieves a significant throughput improvement of about 76x for streaming workloads. The windowed enhancement further reduces running times by around 10x and message volumes by up to 15x at higher parallelism.
