Table of Contents
Fetching ...

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Pragnya Alatur, Anas Barakat, Niao He

TL;DR

The paper addresses scalable Nash equilibrium learning in Markov Potential Games by introducing Independent Policy Mirror Descent (PMD), a decentralized algorithm that unifies projected Q-ascent and Natural Policy Gradient via Euclidean and KL regularizations. It proves non-asymptotic Nash regret bounds, showing that KL regularization achieves a favorable $O(\sqrt{N})$ dependence on the number of agents and is independent of action-space sizes, a significant improvement over prior linear-$N$ results. The analysis leverages a potential function framework and distribution-mismatch coefficients to bound progress per iteration, with the KL-based approach enabling larger steps and tighter guarantees. These results advance scalable, full-information PMD for MPGs and have potential impact for large-scale MARL in domains like energy markets and networked systems.

Abstract

Markov Potential Games (MPGs) form an important sub-class of Markov games, which are a common framework to model multi-agent reinforcement learning problems. In particular, MPGs include as a special case the identical-interest setting where all the agents share the same reward function. Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems. To address this important challenge, we focus on the independent learning setting where agents can only have access to their local information to update their own policy. In prior work on MPGs, the iteration complexity for obtaining $ε$-Nash regret scales linearly with the number of agents $N$. In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better $\sqrt{N}$ dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

TL;DR

The paper addresses scalable Nash equilibrium learning in Markov Potential Games by introducing Independent Policy Mirror Descent (PMD), a decentralized algorithm that unifies projected Q-ascent and Natural Policy Gradient via Euclidean and KL regularizations. It proves non-asymptotic Nash regret bounds, showing that KL regularization achieves a favorable dependence on the number of agents and is independent of action-space sizes, a significant improvement over prior linear- results. The analysis leverages a potential function framework and distribution-mismatch coefficients to bound progress per iteration, with the KL-based approach enabling larger steps and tighter guarantees. These results advance scalable, full-information PMD for MPGs and have potential impact for large-scale MARL in domains like energy markets and networked systems.

Abstract

Markov Potential Games (MPGs) form an important sub-class of Markov games, which are a common framework to model multi-agent reinforcement learning problems. In particular, MPGs include as a special case the identical-interest setting where all the agents share the same reward function. Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems. To address this important challenge, we focus on the independent learning setting where agents can only have access to their local information to update their own policy. In prior work on MPGs, the iteration complexity for obtaining -Nash regret scales linearly with the number of agents . In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.
Paper Structure (12 sections, 10 theorems, 46 equations, 1 table)

This paper contains 12 sections, 10 theorems, 46 equations, 1 table.

Key Result

Proposition V.2

Under Assumption hyp:positive-discounted-visit-distrib, for any $\mu \in \Delta(\mathcal{S}), t \geq 1,$ we have

Theorems & Definitions (21)

  • Remark III.1
  • Remark IV.1
  • Proposition V.2: Potential Improvement - Euclidean PMD
  • Theorem V.3: PMD with Euclidean Regularization
  • Proposition V.5: Potential Improvement - KL PMD
  • Theorem V.6: PMD with KL Regularization
  • Proposition VIII.1
  • proof
  • Remark VIII.2
  • Lemma VIII.3
  • ...and 11 more