Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

Jing Dong; Baoxiang Wang; Yaoliang Yu

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

Jing Dong, Baoxiang Wang, Yaoliang Yu

TL;DR

A variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player is proposed.

Abstract

In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

TL;DR

Abstract

for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from

. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.

Paper Structure (29 sections, 17 theorems, 68 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 29 sections, 17 theorems, 68 equations, 1 figure, 1 table, 2 algorithms.

Introduction
Related Works
Potential game and congestion game
Markov potential game
No-regret Learning for Potential Games
Potential Games
Learning protocol
Solution concepts
Algorithm and analysis for Potential Games
Extension to congestion game
No-regret Learning for Markov Potential Games
Learning protocol
Value function and potential function
Solution concepts
Algorithm for Markov potential game
...and 14 more sections

Key Result

Lemma 3.1

For any $\pi, \pi^\prime \in \Delta(\mathcal{A})$, there exists an $L$ such that $\|\nabla \Phi(\pi) - \nabla\Phi(\pi^\prime)\|_2 \leq L \| \pi - \pi^\prime\|_2$.

Figures (1)

Figure 1: Figure \ref{['fig:facilities']} shows the final converged policy on each of the states. Figure \ref{['fig:curve']} shows the convergence of the algorithms by $L_1$ distance to the final strategy.

Theorems & Definitions (28)

Lemma 3.1: Smoothness
Definition 3.1: Nash equilibrium
Definition 3.2: $\epsilon$-approximate Nash equilibrium
Definition 3.3: Nash regret
Definition 3.4: Regret of the $i$-th player
Theorem 3.1: Nash regret
Remark 3.1
Remark 3.2
Theorem 3.2: Regret for $i$-th player
Lemma 3.2
...and 18 more

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

TL;DR

Abstract

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (28)