Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage
Jushan Chen, Santiago Paternain
TL;DR
This work reframes multi-agent collaborative field coverage as a Markov Potential Game (MPG), enabling the learning of a parameterized closed-loop Nash equilibrium by solving an equivalent single-objective optimal control problem with a shared potential function $J(s_t,\pi(s_t,w))$. By decomposing agent rewards into a common potential and a per-agent residual $\Theta_i$ that is independent of the agent's own state/policy, the authors reduce the complex multi-agent optimization to a tractable single objective and implement a Q-learning-based method to learn the equilibrium. Empirical results show the MPG-based approach trains substantially faster than a game-theoretic baseline and scales to more agents, with faster convergence during policy execution. This yields a scalable, data-driven framework for coordinated multi-UAV coverage with practical significance for autonomous sensing and surveillance tasks.
Abstract
Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
