Table of Contents
Fetching ...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Miguel Suau, Jinke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek

TL;DR

This work addresses the scalability of multi-agent reinforcement learning in large networked systems by factorizing the environment into local regions and tracking inter-region influence with influence distributions $I_i(u_i^t|l_i^t)$. It introduces Distributed Influence-Augmented Local Simulators (DIALS), a parallel framework where independent local simulators run in tandem and are augmented by Approximate Influence Predictors trained on trajectories from a global simulator. The key insights show that multiple joint policies can map to the same influence distribution and that training AIPs less frequently can still yield competitive or better policies due to increased stability in learning signals; experiments on traffic and warehouse domains demonstrate substantial speedups and scaling, with up to 40x reductions in training time for 100 agents. The approach offers a practical route to training large MARL systems offline, leveraging local simulations and periodic influence modeling to mitigate non-stationarity and enable real-world applicability in complex networks.

Abstract

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

TL;DR

This work addresses the scalability of multi-agent reinforcement learning in large networked systems by factorizing the environment into local regions and tracking inter-region influence with influence distributions . It introduces Distributed Influence-Augmented Local Simulators (DIALS), a parallel framework where independent local simulators run in tandem and are augmented by Approximate Influence Predictors trained on trajectories from a global simulator. The key insights show that multiple joint policies can map to the same influence distribution and that training AIPs less frequently can still yield competitive or better policies due to increased stability in learning signals; experiments on traffic and warehouse domains demonstrate substantial speedups and scaling, with up to 40x reductions in training time for 100 agents. The approach offers a practical route to training large MARL systems offline, leveraging local simulations and periodic influence modeling to mitigate non-stationarity and enable real-world applicability in complex networks.

Abstract

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
Paper Structure (38 sections, 10 theorems, 27 equations, 11 figures, 6 tables, 3 algorithms)

This paper contains 38 sections, 10 theorems, 27 equations, 11 figures, 6 tables, 3 algorithms.

Key Result

Lemma 1

Let $\Pi = \times_{i \in N} \Pi_i$ be the product space of joint policies with $\Pi_i$ being the set of policies for agent $i$. Moreover, let $\Psi = \times_{i\in N} \Psi_i$ be the product space of joint influences, with $\Psi_i$ being the set of influence distributions for agent $i$. Every joint po

Figures (11)

  • Figure 1: Left: A Dynamic Bayesian Network showing agent $i$'s transition dynamics in a local-form fPOSG prototype. Right: A conceptual diagram of the IALS.
  • Figure 2: A conceptual diagram of the DIALS
  • Figure 3: (1a) and (1b) Learning curves with the three simulators on the $4$-intersection traffic and 4-robot warehouse environments. (2a) and (2b): Final average return of agents trained with the three simulators for 4M timesteps. (3a) and (3b): Total runtime of training with the three simulators for 4M timesteps. The $y$-axis is in $\log_2$ scale.
  • Figure 4: Left (a) and (b): Learning curves with DIALS for different values of $F$ on the $25$-agent versions of the two environments. Right (a) and (b): Influence CE loss as a function of runtime averaged over the $25$ AIPs.
  • Figure 5: Left (a), (b), (c), and (d): Average return as a function of the number of timesteps with GS, DIALS $F=1$M, and untrained-DIALS on the traffic environment. Right (a), (b), (c), and (d): Total runtime of training for 4M timesteps, $y$-axis is in $\log_2$ scale.
  • ...and 6 more figures

Theorems & Definitions (18)

  • Definition 1: fPOSG
  • Definition 2: Local-form fPOSG
  • Definition 3: IALM
  • Lemma 1
  • Proposition 1
  • Corollary 1
  • Lemma 2
  • Theorem 1
  • Lemma 2
  • proof
  • ...and 8 more