Table of Contents
Fetching ...

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

Asrin Efe Yorulmaz, Tamer Başar

TL;DR

The paper addresses decentralized learning in multi-agent general-sum Markov games and targets fast convergence to Coarse Correlated Equilibrium (CCE). It introduces MG-DLRC-OMWU, an adaptive, stage-wise Optimistic MWU-based method with value-learning that achieves a convergence rate of $O(\log T / T)$ for CCE, matching the best-known rate for correlated equilibrium while improving action-space dependence. The analysis rests on a RVU-type inequality with time-varying learning rates and an equivalent Optimistic FTRL view, yielding both theoretical guarantees and a practical, scalable learning rule. Empirically, the method demonstrates the predicted rate in small general-sum Markov games, suggesting strong potential for high-dimensional MARL problems and motivating future work on constant-regret goals and sample-based extensions.

Abstract

No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the convergence rate to CCE in general-sum Markov games, reducing it from the previously best-known rate of $\mathcal{O}(\log^5 T / T)$ to a sharper $\mathcal{O}(\log T / T)$. This matches the best known convergence rate for CE in terms of $T$, number of iterations, while also improving the dependence on the action set size from polynomial to polylogarithmic-yielding exponential gains in high-dimensional settings. Our approach builds on recent advances in adaptive step-size techniques for no-regret algorithms in normal-form games, and extends them to the Markovian setting via a stage-wise scheme that adjusts learning rates based on real-time feedback. We frame policy updates as an instance of Optimistic Follow-the-Regularized-Leader (OFTRL), customized for value-iteration-based learning. The resulting self-play algorithm achieves, to our knowledge, the fastest known convergence rate to CCE in Markov games.

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

TL;DR

The paper addresses decentralized learning in multi-agent general-sum Markov games and targets fast convergence to Coarse Correlated Equilibrium (CCE). It introduces MG-DLRC-OMWU, an adaptive, stage-wise Optimistic MWU-based method with value-learning that achieves a convergence rate of for CCE, matching the best-known rate for correlated equilibrium while improving action-space dependence. The analysis rests on a RVU-type inequality with time-varying learning rates and an equivalent Optimistic FTRL view, yielding both theoretical guarantees and a practical, scalable learning rule. Empirically, the method demonstrates the predicted rate in small general-sum Markov games, suggesting strong potential for high-dimensional MARL problems and motivating future work on constant-regret goals and sample-based extensions.

Abstract

No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the convergence rate to CCE in general-sum Markov games, reducing it from the previously best-known rate of to a sharper . This matches the best known convergence rate for CE in terms of , number of iterations, while also improving the dependence on the action set size from polynomial to polylogarithmic-yielding exponential gains in high-dimensional settings. Our approach builds on recent advances in adaptive step-size techniques for no-regret algorithms in normal-form games, and extends them to the Markovian setting via a stage-wise scheme that adjusts learning rates based on real-time feedback. We frame policy updates as an instance of Optimistic Follow-the-Regularized-Leader (OFTRL), customized for value-iteration-based learning. The resulting self-play algorithm achieves, to our knowledge, the fastest known convergence rate to CCE in Markov games.

Paper Structure

This paper contains 18 sections, 23 theorems, 105 equations, 1 figure, 1 table, 3 algorithms.

Key Result

lemma 1

Let $\{ \eta_t \}$ be a sequence of learning rate caps with $\eta_t \in (0,1]$. For each agent $i$, at each fixed $(s,h) \in \mathcal{S} \times [H]$, define $\mathcal{R}^{(t)} := \frac{\eta}{w_t}(U^{(t)} + \frac{w_t}{w_{t-1}}u^{(t-1)}), u^{(t)} := w_t(\nu^{(t)} - \langle \nu^{(t)}, \pi^{(t)}_{i} \ra

Figures (1)

  • Figure :

Theorems & Definitions (44)

  • definition 1: $\varepsilon$-Coarse Correlated Equilibrium
  • lemma 1: Equivalence of DLRC--OMWU Formulations with Time‐Varying Step‐Size
  • lemma 2
  • theorem 1: Regret Bounds for MG-DLRC-OMWU
  • lemma 3
  • theorem 2: RVU bound for MG-DLRC-OMWU with time-varying $\eta_t$
  • lemma 4: Per-state weighted regret bounds
  • proof
  • lemma 5: Equivalence of value functions
  • proof
  • ...and 34 more