Table of Contents
Fetching ...

Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing

Jannis Weil, Zhenghua Bao, Osama Abboud, Tobias Meuser

TL;DR

This work tackles generalization in decentralized multi-agent reinforcement learning over graphs by learning graph observations through recurrent message passing, enabling nodes to infer a global graph state without centralized control. The approach decouples graph representation learning from control and integrates end-to-end with deep RL, demonstrating generalization across 1000 diverse routing graphs and adaptation to graph changes without retraining. Across regression and routing tasks, recurrent graph observations confer improved generalization and competitive performance with lower communication overhead, though care is needed to mitigate routing loops via action masking. The results suggest a practical pathway toward scalable, generalizable graph-based MARL in dynamic networks, with future work on reducing communication and handling dynamic graphs.

Abstract

Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.

Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing

TL;DR

This work tackles generalization in decentralized multi-agent reinforcement learning over graphs by learning graph observations through recurrent message passing, enabling nodes to infer a global graph state without centralized control. The approach decouples graph representation learning from control and integrates end-to-end with deep RL, demonstrating generalization across 1000 diverse routing graphs and adaptation to graph changes without retraining. Across regression and routing tasks, recurrent graph observations confer improved generalization and competitive performance with lower communication overhead, though care is needed to mitigate routing loops via action masking. The results suggest a practical pathway toward scalable, generalizable graph-based MARL in dynamic networks, with future work on reducing communication and handling dynamic graphs.

Abstract

Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
Paper Structure (38 sections, 3 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 38 sections, 3 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Our graph observation mechanism iteratively distributes node states via message passing. Agents in the graph receive local graph observations for decision making.
  • Figure 2: Our recurrent message passing model leverages LSTM cells to encode the node observation and update the node state. The hidden states of neighbor nodes are aggregated via summation, cell states remain local to each node.
  • Figure 3: Overview of the considered graphs with (a) three exemplary graphs from the test set and (b) the mean throughput of shortest paths routing with and without bandwidth limitation in all 1000 test graphs.
  • Figure 4: Validation loss of the selected GNN architectures in the shortest paths problem during training. The shaded area shows the standard deviation over 3 models.
  • Figure 5: Reward of DQN with graph observations during training without (left) and with (right) bandwidth limitations. The shaded area shows the standard deviation over 3 models.
  • ...and 2 more figures