Table of Contents
Fetching ...

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Weizhe Chen, Sven Koenig, Bistra Dilkina

TL;DR

MARL-LNS introduces a neighborhood-based training framework for cooperative MARL that reduces training time by updating only subsets of agents per iteration while using existing low-level algorithms like MAPPO. It formalizes three variants—RLNS, BLNS, and ALNS—that differ in how neighborhoods are selected or sized, and proves that the framework preserves the convergence properties of the underlying MARL method under standard assumptions. Empirically, MARL-LNS achieves at least 10% wall-clock speedups on SMAC and GRF without sacrificing final policy performance, with ALNS often providing favorable speed–accuracy trade-offs. The approach offers a practical, generalizable path to more efficient MARL training in large-agent settings, though it relies on random neighborhood selection and shared parameters across agents, and discusses broader societal implications.

Abstract

Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

TL;DR

MARL-LNS introduces a neighborhood-based training framework for cooperative MARL that reduces training time by updating only subsets of agents per iteration while using existing low-level algorithms like MAPPO. It formalizes three variants—RLNS, BLNS, and ALNS—that differ in how neighborhoods are selected or sized, and proves that the framework preserves the convergence properties of the underlying MARL method under standard assumptions. Empirically, MARL-LNS achieves at least 10% wall-clock speedups on SMAC and GRF without sacrificing final policy performance, with ALNS often providing favorable speed–accuracy trade-offs. The approach offers a practical, generalizable path to more efficient MARL training in large-agent settings, though it relies on random neighborhood selection and shared parameters across agents, and discusses broader societal implications.

Abstract

Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
Paper Structure (24 sections, 2 theorems, 1 equation, 2 figures, 6 tables, 2 algorithms)

This paper contains 24 sections, 2 theorems, 1 equation, 2 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

(Adapted from lyu2020convergence) Assume the expected cumulative reward function $\mathcal{J}$ is continuously differentiable with Lipschitz gradient and convex in each neighborhood partition, and the training by the low-level algorithm guarantees that the training happening on the i-th neighborhood

Figures (2)

  • Figure 1: Median value and standard deviation of the RLNS, BLNS, and ALNS training curves compared to MAPPO on two SMAC scenarios. Although the neighborhood size is set as half of the total number of agents, the training curves are not much different.
  • Figure 2: Median value and standard deviation of the BLNS training curve on the 27m_vs_30m scenario on SMAC for different neighborhood sizes $m$.

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2