Table of Contents
Fetching ...

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

TL;DR

This paper provides an uncoupled policy optimization algorithm that attains a near-optimal convergence rate for computing a correlated equilibrium and constructed by combining the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Abstract

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

TL;DR

This paper provides an uncoupled policy optimization algorithm that attains a near-optimal convergence rate for computing a correlated equilibrium and constructed by combining the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

Abstract

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve convergence rate to a correlated equilibrium and an accelerated convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.
Paper Structure (38 sections, 19 theorems, 71 equations, 4 algorithms)

This paper contains 38 sections, 19 theorems, 71 equations, 4 algorithms.

Key Result

Theorem 1

Suppose that the per-state regret has upper bounds $\mathrm{reg}_h^t \le \overline{\mathrm{reg}}_h^t$ for all $(h, t) \in [H] \times [T]$ where $\overline{\mathrm{reg}}_h^t$ is non-increasing in $t$: $\overline{\mathrm{reg}}_h^t \ge \overline{\mathrm{reg}}_h^{t+1}$. Then the output policy of alg:mai for all $T \ge 2$.

Theorems & Definitions (36)

  • Definition 1: Correlated Equilibrium
  • Definition 2: Coarse Correlated Equilibrium
  • Remark 1
  • Theorem 1
  • Theorem 2
  • Theorem 3: RVU for Self-Concordant Barrier with decreasing step size
  • Lemma 1
  • proof
  • Corollary 1
  • Lemma 2
  • ...and 26 more