MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Weizhe Chen; Sven Koenig; Bistra Dilkina

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Weizhe Chen, Sven Koenig, Bistra Dilkina

TL;DR

MARL-LNS introduces a neighborhood-based training framework for cooperative MARL that reduces training time by updating only subsets of agents per iteration while using existing low-level algorithms like MAPPO. It formalizes three variants—RLNS, BLNS, and ALNS—that differ in how neighborhoods are selected or sized, and proves that the framework preserves the convergence properties of the underlying MARL method under standard assumptions. Empirically, MARL-LNS achieves at least 10% wall-clock speedups on SMAC and GRF without sacrificing final policy performance, with ALNS often providing favorable speed–accuracy trade-offs. The approach offers a practical, generalizable path to more efficient MARL training in large-agent settings, though it relies on random neighborhood selection and shared parameters across agents, and discusses broader societal implications.

Abstract

Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

TL;DR

Abstract

Paper Structure (24 sections, 2 theorems, 1 equation, 2 figures, 6 tables, 2 algorithms)

This paper contains 24 sections, 2 theorems, 1 equation, 2 figures, 6 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
Large Neighborhood Search for MARL
Large Neighborhood Search Framework
Random Large Neighborhood Search
Batch Large Neighborhood Search
Adaptive Large Neighborhood Search
Experiments
Experimental Settings
SMAC Testbed
Ablation Study on Neighborhood Size
GRF Testbed
Conclusion
Discussions
...and 9 more sections

Key Result

Theorem 1

(Adapted from lyu2020convergence) Assume the expected cumulative reward function $\mathcal{J}$ is continuously differentiable with Lipschitz gradient and convex in each neighborhood partition, and the training by the low-level algorithm guarantees that the training happening on the i-th neighborhood

Figures (2)

Figure 1: Median value and standard deviation of the RLNS, BLNS, and ALNS training curves compared to MAPPO on two SMAC scenarios. Although the neighborhood size is set as half of the total number of agents, the training curves are not much different.
Figure 2: Median value and standard deviation of the BLNS training curve on the 27m_vs_30m scenario on SMAC for different neighborhood sizes $m$.

Theorems & Definitions (2)

Theorem 1
Theorem 2

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

TL;DR

Abstract

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)