Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems
Miguel Suau, Jinke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek
TL;DR
This work addresses the scalability of multi-agent reinforcement learning in large networked systems by factorizing the environment into local regions and tracking inter-region influence with influence distributions $I_i(u_i^t|l_i^t)$. It introduces Distributed Influence-Augmented Local Simulators (DIALS), a parallel framework where independent local simulators run in tandem and are augmented by Approximate Influence Predictors trained on trajectories from a global simulator. The key insights show that multiple joint policies can map to the same influence distribution and that training AIPs less frequently can still yield competitive or better policies due to increased stability in learning signals; experiments on traffic and warehouse domains demonstrate substantial speedups and scaling, with up to 40x reductions in training time for 100 agents. The approach offers a practical route to training large MARL systems offline, leveraging local simulations and periodic influence modeling to mitigate non-stationarity and enable real-world applicability in complex networks.
Abstract
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
