Table of Contents
Fetching ...

ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments

Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Xinming Wei, Cenlin Duan, Weisheng Zhao, Chunming Hu

TL;DR

ACE-GNN tackles the challenge of deploying Graph Neural Networks on edge devices where resources are constrained and network conditions are dynamic. It introduces a three-phase workflow (Planning, Scheduling, Execution) that uses system-level abstraction and two predictors to adaptively choose between data parallelism and pipeline parallelism, plus a batch inference strategy and lightweight middleware. The key contributions are a system performance predictor with a system graph abstraction, a relative performance predictor for runtime scheme comparison, and a hierarchical optimization approach that quickly identifies near-optimal co-inference schemes. Empirical results show significant improvements in throughput (up to 12.7x) and energy efficiency (up to 82.3%) over state-of-the-art frameworks like GCoDE and Fograph, with strong scalability and generalization to unseen models and hardware, enabling robust edge-GNN inference in heterogeneous, multi-device environments.

Abstract

The device-edge co-inference paradigm effectively bridges the gap between the high resource demands of Graph Neural Networks (GNNs) and limited device resources, making it a promising solution for advancing edge GNN applications. Existing research enhances GNN co-inference by leveraging offline model splitting and pipeline parallelism (PP), which enables more efficient computation and resource utilization during inference. However, the performance of these static deployment methods is significantly affected by environmental dynamics such as network fluctuations and multi-device access, which remain unaddressed. We present ACE-GNN, the first Adaptive GNN Co-inference framework tailored for dynamic Edge environments, to boost system performance and stability. ACE-GNN achieves performance awareness for complex multi-device access edge systems via system-level abstraction and two novel prediction methods, enabling rapid runtime scheme optimization. Moreover, we introduce a data parallelism (DP) mechanism in the runtime optimization space, enabling adaptive scheduling between PP and DP to leverage their distinct advantages and maintain stable system performance. Also, an efficient batch inference strategy and specialized communication middleware are implemented to further improve performance. Extensive experiments across diverse applications and edge settings demonstrate that ACE-GNN achieves a speedup of up to 12.7x and an energy savings of 82.3% compared to GCoDE, as well as 11.7 better energy efficiency than Fograph.

ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments

TL;DR

ACE-GNN tackles the challenge of deploying Graph Neural Networks on edge devices where resources are constrained and network conditions are dynamic. It introduces a three-phase workflow (Planning, Scheduling, Execution) that uses system-level abstraction and two predictors to adaptively choose between data parallelism and pipeline parallelism, plus a batch inference strategy and lightweight middleware. The key contributions are a system performance predictor with a system graph abstraction, a relative performance predictor for runtime scheme comparison, and a hierarchical optimization approach that quickly identifies near-optimal co-inference schemes. Empirical results show significant improvements in throughput (up to 12.7x) and energy efficiency (up to 82.3%) over state-of-the-art frameworks like GCoDE and Fograph, with strong scalability and generalization to unseen models and hardware, enabling robust edge-GNN inference in heterogeneous, multi-device environments.

Abstract

The device-edge co-inference paradigm effectively bridges the gap between the high resource demands of Graph Neural Networks (GNNs) and limited device resources, making it a promising solution for advancing edge GNN applications. Existing research enhances GNN co-inference by leveraging offline model splitting and pipeline parallelism (PP), which enables more efficient computation and resource utilization during inference. However, the performance of these static deployment methods is significantly affected by environmental dynamics such as network fluctuations and multi-device access, which remain unaddressed. We present ACE-GNN, the first Adaptive GNN Co-inference framework tailored for dynamic Edge environments, to boost system performance and stability. ACE-GNN achieves performance awareness for complex multi-device access edge systems via system-level abstraction and two novel prediction methods, enabling rapid runtime scheme optimization. Moreover, we introduce a data parallelism (DP) mechanism in the runtime optimization space, enabling adaptive scheduling between PP and DP to leverage their distinct advantages and maintain stable system performance. Also, an efficient batch inference strategy and specialized communication middleware are implemented to further improve performance. Extensive experiments across diverse applications and edge settings demonstrate that ACE-GNN achieves a speedup of up to 12.7x and an energy savings of 82.3% compared to GCoDE, as well as 11.7 better energy efficiency than Fograph.

Paper Structure

This paper contains 21 sections, 1 equation, 21 figures, 3 tables, 1 algorithm.

Figures (21)

  • Figure 1: Device-Edge Co-Inference for GNNs.
  • Figure 2: Existing GNN co-inference designs from GCoDE, leveraging system heterogeneity.
  • Figure 3: Comparison of GNN inference performance under varying network conditions on ModelNet40 wu20153d, including Device-Only, Edge-Only, and Co-Inference modes on Jetson TX2 and Intel CPU.
  • Figure 4: System efficiency under concurrent access from multiple edge devices on point cloud processing, with Raspberry Pi 4B serving as edge devices.
  • Figure 5: Illustration of pipeline parallelism (PP) and data parallelism (DP) in GNN device-edge co-inference. PP splits model stages across devices for pipelined processing, while DP executes multiple inputs in parallel using replicated models.
  • ...and 16 more figures