ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments
Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Xinming Wei, Cenlin Duan, Weisheng Zhao, Chunming Hu
TL;DR
ACE-GNN tackles the challenge of deploying Graph Neural Networks on edge devices where resources are constrained and network conditions are dynamic. It introduces a three-phase workflow (Planning, Scheduling, Execution) that uses system-level abstraction and two predictors to adaptively choose between data parallelism and pipeline parallelism, plus a batch inference strategy and lightweight middleware. The key contributions are a system performance predictor with a system graph abstraction, a relative performance predictor for runtime scheme comparison, and a hierarchical optimization approach that quickly identifies near-optimal co-inference schemes. Empirical results show significant improvements in throughput (up to 12.7x) and energy efficiency (up to 82.3%) over state-of-the-art frameworks like GCoDE and Fograph, with strong scalability and generalization to unseen models and hardware, enabling robust edge-GNN inference in heterogeneous, multi-device environments.
Abstract
The device-edge co-inference paradigm effectively bridges the gap between the high resource demands of Graph Neural Networks (GNNs) and limited device resources, making it a promising solution for advancing edge GNN applications. Existing research enhances GNN co-inference by leveraging offline model splitting and pipeline parallelism (PP), which enables more efficient computation and resource utilization during inference. However, the performance of these static deployment methods is significantly affected by environmental dynamics such as network fluctuations and multi-device access, which remain unaddressed. We present ACE-GNN, the first Adaptive GNN Co-inference framework tailored for dynamic Edge environments, to boost system performance and stability. ACE-GNN achieves performance awareness for complex multi-device access edge systems via system-level abstraction and two novel prediction methods, enabling rapid runtime scheme optimization. Moreover, we introduce a data parallelism (DP) mechanism in the runtime optimization space, enabling adaptive scheduling between PP and DP to leverage their distinct advantages and maintain stable system performance. Also, an efficient batch inference strategy and specialized communication middleware are implemented to further improve performance. Extensive experiments across diverse applications and edge settings demonstrate that ACE-GNN achieves a speedup of up to 12.7x and an energy savings of 82.3% compared to GCoDE, as well as 11.7 better energy efficiency than Fograph.
