Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Yang Cai; Haipeng Luo; Chen-Yu Wei; Weiqiang Zheng

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

TL;DR

This paper provides an uncoupled policy optimization algorithm that attains a near-optimal convergence rate for computing a correlated equilibrium and constructed by combining the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Abstract

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

TL;DR

Abstract

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve

convergence rate to a correlated equilibrium and an accelerated

convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal

convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Paper Structure (38 sections, 19 theorems, 71 equations, 4 algorithms)

This paper contains 38 sections, 19 theorems, 71 equations, 4 algorithms.

Introduction
Related Work
Learning in normal-form games
Learning in Markov games
Preliminaries
Multi-player General-Sum Markov Games
Policies and Value Functions
Strategy Modification and Correlated Equilibrium
Additional Notations
Online Learning and Regret
Algorithm and Main Results
Value Update
Bounding Correlated Equilibrium Gap by Per-State Regret
Proof Overview
Policy Update
...and 23 more sections

Key Result

Theorem 1

Suppose that the per-state regret has upper bounds $\mathrm{reg}_h^t \le \overline{\mathrm{reg}}_h^t$ for all $(h, t) \in [H] \times [T]$ where $\overline{\mathrm{reg}}_h^t$ is non-increasing in $t$: $\overline{\mathrm{reg}}_h^t \ge \overline{\mathrm{reg}}_h^{t+1}$. Then the output policy of alg:mai for all $T \ge 2$.

Theorems & Definitions (36)

Definition 1: Correlated Equilibrium
Definition 2: Coarse Correlated Equilibrium
Remark 1
Theorem 1
Theorem 2
Theorem 3: RVU for Self-Concordant Barrier with decreasing step size
Lemma 1
proof
Corollary 1
Lemma 2
...and 26 more

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

TL;DR

Abstract

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (36)