Beyond Theorems: A Counterexample to Potential Markov Game Criteria

Fatemeh Fardno; Seyed Majid Zahedi

Beyond Theorems: A Counterexample to Potential Markov Game Criteria

Fatemeh Fardno, Seyed Majid Zahedi

TL;DR

The paper questions whether a relaxed criterion for Markov potential games suffices to guarantee that a deterministic stationary Nash equilibrium can be found by solving a dual MDP. It constructs a continuous-space, infinite-horizon counterexample that satisfies the proposed OPSG and state-transitivity conditions yet yields a Nash equilibrium that differs from the dual-MDP optimum. This finding refutes the claimed equivalence and challenges the practical utility of the relaxed conditions for guiding independent-learning algorithms. The result highlights the need for stronger or alternative assumptions to ensure efficient computation of equilibria in multi-agent stochastic settings.

Abstract

There are only limited classes of multi-player stochastic games in which independent learning is guaranteed to converge to a Nash equilibrium. Markov potential games are a key example of such classes. Prior work has outlined sets of sufficient conditions for a stochastic game to qualify as a Markov potential game. However, these conditions often impose strict limitations on the game's structure and tend to be challenging to verify. To address these limitations, Mguni et al. [12] introduce a relaxed notion of Markov potential games and offer an alternative set of necessary conditions for categorizing stochastic games as potential games. Under these conditions, the authors claim that a deterministic Nash equilibrium can be computed efficiently by solving a dual Markov decision process. In this paper, we offer evidence refuting this claim by presenting a counterexample.

Beyond Theorems: A Counterexample to Potential Markov Game Criteria

TL;DR

Abstract

Paper Structure (8 sections, 1 theorem, 16 equations)

This paper contains 8 sections, 1 theorem, 16 equations.

Introduction
Background and Related Work
Stochastic Games
Markov Potential Games
From Stochastic Games to MPG
Analysis
Counterexample
Conclusion

Key Result

Theorem 1

In an $n$-agent MPG, if all agents run independent policy gradient, then for any $\epsilon > 0$, the learning dynamics reaches an $\epsilon$-Nash equilibrium strategy after $O(1/\epsilon^2)$ iterations.

Theorems & Definitions (7)

Definition 1: MDP
Definition 2: Stochastic game
Definition 3: $\bm{\epsilon}$-Nash equilibrium
Definition 4: MPG
Theorem 1: leonardos2021global
Definition 5: OPSG
Claim 1: mguni2021learning

Beyond Theorems: A Counterexample to Potential Markov Game Criteria

TL;DR

Abstract

Beyond Theorems: A Counterexample to Potential Markov Game Criteria

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (7)