Table of Contents
Fetching ...

Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage

Jushan Chen, Santiago Paternain

TL;DR

This work reframes multi-agent collaborative field coverage as a Markov Potential Game (MPG), enabling the learning of a parameterized closed-loop Nash equilibrium by solving an equivalent single-objective optimal control problem with a shared potential function $J(s_t,\pi(s_t,w))$. By decomposing agent rewards into a common potential and a per-agent residual $\Theta_i$ that is independent of the agent's own state/policy, the authors reduce the complex multi-agent optimization to a tractable single objective and implement a Q-learning-based method to learn the equilibrium. Empirical results show the MPG-based approach trains substantially faster than a game-theoretic baseline and scales to more agents, with faster convergence during policy execution. This yields a scalable, data-driven framework for coordinated multi-UAV coverage with practical significance for autonomous sensing and surveillance tasks.

Abstract

Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.

Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage

TL;DR

This work reframes multi-agent collaborative field coverage as a Markov Potential Game (MPG), enabling the learning of a parameterized closed-loop Nash equilibrium by solving an equivalent single-objective optimal control problem with a shared potential function . By decomposing agent rewards into a common potential and a per-agent residual that is independent of the agent's own state/policy, the authors reduce the complex multi-agent optimization to a tractable single objective and implement a Q-learning-based method to learn the equilibrium. Empirical results show the MPG-based approach trains substantially faster than a game-theoretic baseline and scales to more agents, with faster convergence during policy execution. This yields a scalable, data-driven framework for coordinated multi-UAV coverage with practical significance for autonomous sensing and surveillance tasks.

Abstract

Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.

Paper Structure

This paper contains 9 sections, 2 theorems, 15 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

Let us consider the Markov Game $\mathcal{G}$ defined in eqn:parametric_markov let Assumptions assumption_convex-assumption_bounded hold. In addition, we assume that the reward of each agent $i$, $r_{i}$ is twice continuously differentiable in $\mathcal{S} \times \mathcal{A}$. Then, $\mathcal{G}$ i and 2) the following condition on the non-common term $\Theta_i$ holds:

Figures (6)

  • Figure 1: Illustration of the field of view (FOV) of a single agent. We assume that the agent is a UAV with small half-angles limiting its field of view. In this example, the FOV is a square. In general, the FOV has a rectangular shape.
  • Figure 2: A simplified visualization of multiple UAVs attempting to cover a set of targets. Each UAV has a limited FOV shown as a bounding box. The static targets are shown as stars colored in black.
  • Figure 3: On the left is a comparison of the training time between our algorithm and a game-theoretic baseline cooperative_cover, and on the right is a comparison of cumulative returns versus training time between our algorithm and the baseline. We run each algorithm for 400 episodes with 200 steps per episode. To make a fair comparison, we add an additional baseline (orange colored) by parameterizing the $Q$ function with the FSR used by the baseline.
  • Figure 4: Policy execution to recover a PCL-NE for the 2-agent scenario. We set the maximum number of transition steps to 20. We observe that Algorithm . \ref{['alg:dqn_potential']} converges to an equilibrium much faster than the baseline.
  • Figure 5: Policy execution for the 4-agent scenario with Algorithm. \ref{['alg:dqn_potential']}.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1
  • Theorem 1
  • Proposition 1
  • proof