Table of Contents
Fetching ...

Dynamic Graph Communication for Decentralised Multi-Agent Reinforcement Learning

Ben McClusky

TL;DR

This work addresses the challenge of decentralized multi-agent reinforcement learning in dynamic networks where topology changes and node failures complicate coordination. It extends the NetMon framework by integrating a Graph Attention Network layer into recurrent message passing and introducing a multi-round, attention-guided Iteration Controller to selectively propagate information, trained end-to-end with reinforcement learning under CTDE. Empirically, the approach yields up to 9.5% higher rewards and 6.4% lower communication overhead in dynamic network packet routing compared with baselines, and a 4.8% improvement from the GAT-based aggregation in dynamic settings, alongside improved graph representations and resilience to failures. The results demonstrate the potential for scalable, efficient, and robust decentralized routing in real-world networks, while the work also discusses ethical considerations, limitations, and directions for future research.

Abstract

This work presents a novel communication framework for decentralized multi-agent systems operating in dynamic network environments. Integrated into a multi-agent reinforcement learning system, the framework is designed to enhance decision-making by optimizing the network's collective knowledge through efficient communication. Key contributions include adapting a static network packet-routing scenario to a dynamic setting with node failures, incorporating a graph attention network layer in a recurrent message-passing framework, and introducing a multi-round communication targeting mechanism. This approach enables an attention-based aggregation mechanism to be successfully trained within a sparse-reward, dynamic network packet-routing environment using only reinforcement learning. Experimental results show improvements in routing performance, including a 9.5 percent increase in average rewards and a 6.4 percent reduction in communication overhead compared to a baseline system. The study also examines the ethical and legal implications of deploying such systems in critical infrastructure and military contexts, identifies current limitations, and suggests potential directions for future research.

Dynamic Graph Communication for Decentralised Multi-Agent Reinforcement Learning

TL;DR

This work addresses the challenge of decentralized multi-agent reinforcement learning in dynamic networks where topology changes and node failures complicate coordination. It extends the NetMon framework by integrating a Graph Attention Network layer into recurrent message passing and introducing a multi-round, attention-guided Iteration Controller to selectively propagate information, trained end-to-end with reinforcement learning under CTDE. Empirically, the approach yields up to 9.5% higher rewards and 6.4% lower communication overhead in dynamic network packet routing compared with baselines, and a 4.8% improvement from the GAT-based aggregation in dynamic settings, alongside improved graph representations and resilience to failures. The results demonstrate the potential for scalable, efficient, and robust decentralized routing in real-world networks, while the work also discusses ethical considerations, limitations, and directions for future research.

Abstract

This work presents a novel communication framework for decentralized multi-agent systems operating in dynamic network environments. Integrated into a multi-agent reinforcement learning system, the framework is designed to enhance decision-making by optimizing the network's collective knowledge through efficient communication. Key contributions include adapting a static network packet-routing scenario to a dynamic setting with node failures, incorporating a graph attention network layer in a recurrent message-passing framework, and introducing a multi-round communication targeting mechanism. This approach enables an attention-based aggregation mechanism to be successfully trained within a sparse-reward, dynamic network packet-routing environment using only reinforcement learning. Experimental results show improvements in routing performance, including a 9.5 percent increase in average rewards and a 6.4 percent reduction in communication overhead compared to a baseline system. The study also examines the ethical and legal implications of deploying such systems in critical infrastructure and military contexts, identifies current limitations, and suggests potential directions for future research.
Paper Structure (108 sections, 9 equations, 64 figures, 18 tables, 9 algorithms)

This paper contains 108 sections, 9 equations, 64 figures, 18 tables, 9 algorithms.

Figures (64)

  • Figure 1: Agent-Environment interaction within a Markov Decision Process sutton2018reinforcement: The agent receives the current state $s_t$ from the environment and takes an action $a_t$. The environment then provides a reward $r_t$ and the next state $s_{t+1}$.
  • Figure 2: Deep Q-Network (DQN) Architecture cong2021deep: The agent utilises experience replay to store and sample experiences for training. The Q-network is trained by minimising the loss between the predicted Q-value and the target Q-value, with periodic updates to the target network's weights ($\theta \rightarrow \theta'$).
  • Figure 3: Actor-Critic Framework: The Actor (Policy) selects actions based on the state. The Critic (Value Function) evaluates these actions using the TD error to update both the Policy and Value Function, iteratively improving performance. sutton2018reinforcement
  • Figure 4: POMDP framework: The agent updates its belief state $b_t$ based on observation $o_t$ and selects action $a_{t+1}$ according to its policy. The environment transitions to a new state $s_{t+1}$, emits an observation, and provides a reward $r_t$.
  • Figure 5: A non-stationary environment in a multi-agent system, where each "model" represents an agent. Agents' actions $a_t$ cause transitions between states $S_1, S_2, S_3$, with evolving system dynamics as agents update their policies. padakandla2020reinforcement
  • ...and 59 more figures

Theorems & Definitions (1)

  • Definition 1