Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing
Jannis Weil, Zhenghua Bao, Osama Abboud, Tobias Meuser
TL;DR
This work tackles generalization in decentralized multi-agent reinforcement learning over graphs by learning graph observations through recurrent message passing, enabling nodes to infer a global graph state without centralized control. The approach decouples graph representation learning from control and integrates end-to-end with deep RL, demonstrating generalization across 1000 diverse routing graphs and adaptation to graph changes without retraining. Across regression and routing tasks, recurrent graph observations confer improved generalization and competitive performance with lower communication overhead, though care is needed to mitigate routing loops via action masking. The results suggest a practical pathway toward scalable, generalizable graph-based MARL in dynamic networks, with future work on reducing communication and handling dynamic graphs.
Abstract
Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
