MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search
Weizhe Chen, Sven Koenig, Bistra Dilkina
TL;DR
MARL-LNS introduces a neighborhood-based training framework for cooperative MARL that reduces training time by updating only subsets of agents per iteration while using existing low-level algorithms like MAPPO. It formalizes three variants—RLNS, BLNS, and ALNS—that differ in how neighborhoods are selected or sized, and proves that the framework preserves the convergence properties of the underlying MARL method under standard assumptions. Empirically, MARL-LNS achieves at least 10% wall-clock speedups on SMAC and GRF without sacrificing final policy performance, with ALNS often providing favorable speed–accuracy trade-offs. The approach offers a practical, generalizable path to more efficient MARL training in large-agent settings, though it relies on random neighborhood selection and shared parameters across agents, and discusses broader societal implications.
Abstract
Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
