Table of Contents
Fetching ...

Fully Distributed Online Training of Graph Neural Networks in Networked Systems

Rostyslav Olshevskyi, Zhongyuan Zhao, Kevin Chan, Gunjan Verma, Ananthram Swami, Santiago Segarra

TL;DR

This work introduces a fully distributed online training framework for graph neural networks tailored to networked systems, addressing the limitations of centralized training. It reformulates GCNN training as a distributed optimization problem and provides a three-part solution: fully distributed backpropagation to estimate local gradients, distributed SGD (with consensus-based gradient aggregation), and communication-efficient mini-batch strategies including piggybacking and information reuse. The approach enables training across supervised, unsupervised, and reinforcement learning pipelines, with numerical results showing near-centralized performance and practical gains in tasks like node regression, UWMMSE power allocation, and wireless link scheduling. The work lays groundwork for scalable, adaptive intelligent networks, with future directions including convergence proofs and robustness to real-world communication constraints.

Abstract

Graph neural networks (GNNs) are powerful tools for developing scalable, decentralized artificial intelligence in large-scale networked systems, such as wireless networks, power grids, and transportation networks. Currently, GNNs in networked systems mostly follow a paradigm of `centralized training, distributed execution', which limits their adaptability and slows down their development cycles. In this work, we fill this gap for the first time by developing a communication-efficient, fully distributed online training approach for GNNs applied to large networked systems. For a mini-batch with $B$ samples, our approach of training an $L$-layer GNN only adds $L$ rounds of message passing to the $LB$ rounds required by GNN inference, with doubled message sizes. Through numerical experiments in graph-based node regression, power allocation, and link scheduling in wireless networks, we demonstrate the effectiveness of our approach in training GNNs under supervised, unsupervised, and reinforcement learning paradigms.

Fully Distributed Online Training of Graph Neural Networks in Networked Systems

TL;DR

This work introduces a fully distributed online training framework for graph neural networks tailored to networked systems, addressing the limitations of centralized training. It reformulates GCNN training as a distributed optimization problem and provides a three-part solution: fully distributed backpropagation to estimate local gradients, distributed SGD (with consensus-based gradient aggregation), and communication-efficient mini-batch strategies including piggybacking and information reuse. The approach enables training across supervised, unsupervised, and reinforcement learning pipelines, with numerical results showing near-centralized performance and practical gains in tasks like node regression, UWMMSE power allocation, and wireless link scheduling. The work lays groundwork for scalable, adaptive intelligent networks, with future directions including convergence proofs and robustness to real-world communication constraints.

Abstract

Graph neural networks (GNNs) are powerful tools for developing scalable, decentralized artificial intelligence in large-scale networked systems, such as wireless networks, power grids, and transportation networks. Currently, GNNs in networked systems mostly follow a paradigm of `centralized training, distributed execution', which limits their adaptability and slows down their development cycles. In this work, we fill this gap for the first time by developing a communication-efficient, fully distributed online training approach for GNNs applied to large networked systems. For a mini-batch with samples, our approach of training an -layer GNN only adds rounds of message passing to the rounds required by GNN inference, with doubled message sizes. Through numerical experiments in graph-based node regression, power allocation, and link scheduling in wireless networks, we demonstrate the effectiveness of our approach in training GNNs under supervised, unsupervised, and reinforcement learning paradigms.

Paper Structure

This paper contains 11 sections, 21 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Timeline of fully-distributed training of GCNN in mini-batches. By piggybacking messages in the backward pass of sample $b-1$ into the messages of the forward pass of sample $b$, a mini-batch requires only $L(B+1)$ rounds of MP. Notice that most communication and computation for the consensus step and local gradient aggregation (line $12$ in Algo \ref{['algo:foo']}) can be piggybacked to the messages of $B$ forward passes and carried out in parallel with the processing of data samples (lines $5 - 11$ in Algo \ref{['algo:foo']}).
  • Figure 2: The evolution of objective values over the course of training: (a) Node regression, where a marker is placed every 200 mini-batches. (b) Power allocation for a network of 25 transmitter-receiver pairs. (c) Distributed link scheduling in conflict graphs of 100 nodes (links).