Table of Contents
Fetching ...

Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games

Bora Yongacoglu, Gürdal Arslan, Serdar Yüksel

TL;DR

The feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games is investigated and high probability guarantees of convergence to $\epsilon$-equilibrium under self-play are given.

Abstract

In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For $ε\geq 0$, an $ε$-satisficing policy update rule is any rule that instructs the agent to not change its policy when it is $ε$-best-responding to the policies of the remaining players; $ε$-satisficing paths are defined to be sequences of joint policies obtained when each agent uses some $ε$-satisficing policy update rule to select its next policy. We establish structural results on the existence of $ε$-satisficing paths into $ε$-equilibrium in both symmetric $N$-player games and general stochastic games with two players. We then present an independent learning algorithm for $N$-player symmetric games and give high probability guarantees of convergence to $ε$-equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of $ε$-satisficing paths.

Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games

TL;DR

The feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games is investigated and high probability guarantees of convergence to -equilibrium under self-play are given.

Abstract

In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For , an -satisficing policy update rule is any rule that instructs the agent to not change its policy when it is -best-responding to the policies of the remaining players; -satisficing paths are defined to be sequences of joint policies obtained when each agent uses some -satisficing policy update rule to select its next policy. We establish structural results on the existence of -satisficing paths into -equilibrium in both symmetric -player games and general stochastic games with two players. We then present an independent learning algorithm for -player symmetric games and give high probability guarantees of convergence to -equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of -satisficing paths.

Paper Structure

This paper contains 27 sections, 17 theorems, 59 equations, 2 figures, 1 table.

Key Result

Lemma 8

\newlabellemma:symmetric-equality0 Let $\mathcal{G}$ be a symmetric game and let $\bm{\pi} \in \bm{\Gamma}_{S}$ be a stationary joint policy. For $i, j \in \mathcal{N}$, if $\pi^i = \pi^j$, then $J^i ( \pi^i, \bm{\pi}^{-i} , x ) = J^j (\pi^j, \bm{\pi}^{-j} , x )$, for any $x \in \mathbb{X}.$

Figures (2)

  • Figure 1: The stage games for a two-state stochastic game. Player 1 (2) picks a row (column), and its reward, to be maximized, is the 1st (2nd) entry of the chosen cell.
  • Figure 2: Frequency of $\{ \bm{\pi}_k \in \bm{\Gamma}^{\epsilon{\rm \text{-}eq}} _{S} \cap \bm{\Pi} \}$, averaged over 250 trials.

Theorems & Definitions (37)

  • Definition 1: Policies
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7: Symmetric Game
  • Lemma 8
  • Proof 1
  • Corollary 9
  • ...and 27 more