Q-MARL: A quantum-inspired algorithm using neural message passing for large-scale multi-agent reinforcement learning
Kha Vo, Chin-Teng Lin
TL;DR
Q-MARL addresses the scalability bottleneck in multi-agent reinforcement learning by fully decentralising training through graph-based sub-graphs and neural message passing. Each agent is treated as the center of a dynamic local neighbourhood, and actions are ensembled across all sub-graphs containing the agent, enabling efficient learning with thousands of agents without requiring common rewards or fixed agent order. The framework provides theoretical convergence guarantees under time-varying graphs and demonstrates dramatic improvements in training speed and loss, with strong generalisation across Jungle, Battle, and Deception scenarios compared to contemporary graph-based MARL methods. This approach offers a scalable, decentralized solution for large-scale cooperative-competitive MARL tasks, with potential impact on complex multi-agent systems and distributed decision-making.
Abstract
Inspired by a graph-based technique for predicting molecular properties in quantum chemistry -- atoms' position within molecules in three-dimensional space -- we present Q-MARL, a completely decentralised learning architecture that supports very large-scale multi-agent reinforcement learning scenarios without the need for strong assumptions like common rewards or agent order. The key is to treat each agent as relative to its surrounding agents in an environment that is presumed to change dynamically. Hence, in each time step, an agent is the centre of its own neighbourhood and also a neighbour to many other agents. Each role is formulated as a sub-graph, and each sub-graph is used as a training sample. A message-passing neural network supports full-scale vertex and edge interaction within a local neighbourhood, while a parameter governing the depth of the sub-graphs eases the training burden. During testing, an agent's actions are locally ensembled across all the sub-graphs that contain it, resulting in robust decisions. Where other approaches struggle to manage 50 agents, Q-MARL can easily marshal thousands. A detailed theoretical analysis proves improvement and convergence, and simulations with the typical collaborative and competitive scenarios show dramatically faster training speeds and reduced training losses.
