Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator
Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter
TL;DR
The paper tackles multi-agent reinforcement learning for linear systems with convex constraints by using a structured distributed MPC scheme as a function approximator for the policy and value functions. It develops a distributed Q-learning framework where ADMM and GAC enable fully decentralized evaluation and learning, with local dual variables aligning to the centralized optimum. A key theoretical result links local ADMM duals to the global duals, enabling per-agent updates that reproduce centralized learning while preserving privacy. Empirical results on an academic chain and a power-system network demonstrate comparable performance to centralized methods and robust constraint satisfaction under model uncertainty. The work highlights the potential of combining distributed optimization with MPC-based RL to achieve safe, interpretable, and scalable MARL in networked systems.
Abstract
This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on two numerical examples.
