Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Pragnya Alatur; Anas Barakat; Niao He

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Pragnya Alatur, Anas Barakat, Niao He

TL;DR

The paper addresses scalable Nash equilibrium learning in Markov Potential Games by introducing Independent Policy Mirror Descent (PMD), a decentralized algorithm that unifies projected Q-ascent and Natural Policy Gradient via Euclidean and KL regularizations. It proves non-asymptotic Nash regret bounds, showing that KL regularization achieves a favorable $O(\sqrt{N})$ dependence on the number of agents and is independent of action-space sizes, a significant improvement over prior linear-$N$ results. The analysis leverages a potential function framework and distribution-mismatch coefficients to bound progress per iteration, with the KL-based approach enabling larger steps and tighter guarantees. These results advance scalable, full-information PMD for MPGs and have potential impact for large-scale MARL in domains like energy markets and networked systems.

Abstract

Markov Potential Games (MPGs) form an important sub-class of Markov games, which are a common framework to model multi-agent reinforcement learning problems. In particular, MPGs include as a special case the identical-interest setting where all the agents share the same reward function. Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems. To address this important challenge, we focus on the independent learning setting where agents can only have access to their local information to update their own policy. In prior work on MPGs, the iteration complexity for obtaining $ε$-Nash regret scales linearly with the number of agents $N$. In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better $\sqrt{N}$ dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

TL;DR

dependence on the number of agents and is independent of action-space sizes, a significant improvement over prior linear-

results. The analysis leverages a potential function framework and distribution-mismatch coefficients to bound progress per iteration, with the KL-based approach enabling larger steps and tighter guarantees. These results advance scalable, full-information PMD for MPGs and have potential impact for large-scale MARL in domains like energy markets and networked systems.

Abstract

-Nash regret scales linearly with the number of agents

. In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better

dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.

Paper Structure (12 sections, 10 theorems, 46 equations, 1 table)

This paper contains 12 sections, 10 theorems, 46 equations, 1 table.

Introduction
Related Work
Preliminaries
Independent Policy Mirror Descent
Nash Regret Analysis
Analysis of PMD with Euclidean Regularization
Analysis of PMD with KL Regularization
Conclusion and Future Work
ACKNOWLEDGMENTS
Proof of Theorem \ref{['thm:exact-pmd-euclidean']} (Euclidean Regularization)
Proof of Theorem \ref{['thm:exact-pmd-natural-gradient']} (KL Regularization)
Auxiliary Lemmas

Key Result

Proposition V.2

Under Assumption hyp:positive-discounted-visit-distrib, for any $\mu \in \Delta(\mathcal{S}), t \geq 1,$ we have

Theorems & Definitions (21)

Remark III.1
Remark IV.1
Proposition V.2: Potential Improvement - Euclidean PMD
Theorem V.3: PMD with Euclidean Regularization
Proposition V.5: Potential Improvement - KL PMD
Theorem V.6: PMD with KL Regularization
Proposition VIII.1
proof
Remark VIII.2
Lemma VIII.3
...and 11 more

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

TL;DR

Abstract

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)