Table of Contents
Fetching ...

Team-Fictitious Play for Reaching Team-Nash Equilibrium in Multi-team Games

Ahmed Said Donmez, Yuksel Arslantas, Muhammed O. Sayin

TL;DR

This work introduces Team-Fictitious Play (Team-FP), a new variant of fictitious play where agents respond to the last actions of team members and the beliefs formed about other teams with some inertia in action updates and shows that Team-FP reaches near TNE in ZSPTGs with a quantifiable error bound.

Abstract

Multi-team games, prevalent in robotics and resource management, involve team members striving for a joint best response against other teams. Team-Nash equilibrium (TNE) predicts the outcomes of such coordinated interactions. However, can teams of self-interested agents reach TNE? We introduce Team-Fictitious Play (Team-FP), a new variant of fictitious play where agents respond to the last actions of team members and the beliefs formed about other teams with some inertia in action updates. This design is essential in team coordination beyond the classical fictitious play dynamics. We focus on zero-sum potential team games (ZSPTGs) where teams can interact pairwise while the team members do not necessarily have identical payoffs. We show that Team-FP reaches near TNE in ZSPTGs with a quantifiable error bound. We extend Team-FP dynamics to multi-team Markov games for model-based and model-free cases. The convergence analysis tackles the challenge of non-stationarity induced by evolving opponent strategies based on the optimal coupling lemma and stochastic differential inclusion approximation methods. Our work strengthens the foundation for using TNE to predict the behavior of decentralized teams and offers a practical rule for team learning in multi-team environments. We provide extensive simulations of Team-FP dynamics and compare its performance with other widely studied dynamics such as smooth fictitious play and multiplicative weights update. We further explore how different parameters impact the speed of convergence.

Team-Fictitious Play for Reaching Team-Nash Equilibrium in Multi-team Games

TL;DR

This work introduces Team-Fictitious Play (Team-FP), a new variant of fictitious play where agents respond to the last actions of team members and the beliefs formed about other teams with some inertia in action updates and shows that Team-FP reaches near TNE in ZSPTGs with a quantifiable error bound.

Abstract

Multi-team games, prevalent in robotics and resource management, involve team members striving for a joint best response against other teams. Team-Nash equilibrium (TNE) predicts the outcomes of such coordinated interactions. However, can teams of self-interested agents reach TNE? We introduce Team-Fictitious Play (Team-FP), a new variant of fictitious play where agents respond to the last actions of team members and the beliefs formed about other teams with some inertia in action updates. This design is essential in team coordination beyond the classical fictitious play dynamics. We focus on zero-sum potential team games (ZSPTGs) where teams can interact pairwise while the team members do not necessarily have identical payoffs. We show that Team-FP reaches near TNE in ZSPTGs with a quantifiable error bound. We extend Team-FP dynamics to multi-team Markov games for model-based and model-free cases. The convergence analysis tackles the challenge of non-stationarity induced by evolving opponent strategies based on the optimal coupling lemma and stochastic differential inclusion approximation methods. Our work strengthens the foundation for using TNE to predict the behavior of decentralized teams and offers a practical rule for team learning in multi-team environments. We provide extensive simulations of Team-FP dynamics and compare its performance with other widely studied dynamics such as smooth fictitious play and multiplicative weights update. We further explore how different parameters impact the speed of convergence.
Paper Structure (21 sections, 5 theorems, 67 equations, 7 figures, 3 algorithms)

This paper contains 21 sections, 5 theorems, 67 equations, 7 figures, 3 algorithms.

Key Result

Theorem 4.2

Given a ZSPTG characterized by $\langle \mathcal{T}, (A^{i}, u^{i})_{i\in \mathcal{I}}\rangle$, let every agent follow either Team-FP or Independent Team-FP, as described in Algorithm tab:algcla. If Assumption assm:stepsize holds, then the team-Nash gap for $\pi_k := (\pi_k^m)_{m\in\mathcal{T}}$ sat almost surely, where $\overline{\phi}\coloneqq \max_{(m,l,a)}|\phi^{ml}(a)|$.

Figures (7)

  • Figure 1: An illustration of networked interconnections agents from different teams. Nodes in bottom and top layers refer, resp., to team members and teams. Undirected edges represent the impact of actions on the payoff functions. We use different colors and shapes to represent agents from the same teams, and they are connected via dashed edges.
  • Figure 2: An illustration of Team-FP dynamics for two-team games on the left-hand side. Team actions change according to a transition kernel depending on the beliefs formed about the other teams. Dashed lines represent the time shift. On the right-hand side, we depict the key proof idea that we approximate the evolution of the team actions with a reference scenario where beliefs are stationary such that team actions form a homogeneous Markov chain whose unique stationary distribution corresponds to the best team response.
  • Figure 3: All the above figures show the variation of $\mathrm{TNG}$ over time. (a) Comparison of different levels of explicit coordination for Team-FP: independent agents (group size 1), pairs of cooperating agents (group size 2), and fully coordinated teams (group size 4). (b) Performance of Team-FP and Independent Team-FP compared to MWU and SFP algorithms in a 2-team ZSPTG. (c) Convergence of Team-FP against stationary and competitive opponents in a 3-team ZSPTG.
  • Figure 4: (a) The illustration of an airport security game: a security chief guarding the six gates of an airport against three different intruders making decisions autonomously. (b) The evolution of Team Nash Gap in airport security game, showing that Team-FP dynamics reach near team-minimax equilibrium.
  • Figure 5: The 3-team experiments are tested on the randomly generated network structure (a). The other figures (b), and (c), shows the variation of $\mathrm{TNG}$ over iterations. (a) The simulation network for a multi-team ZSPTG, in which there are 3-teams with 3 agents in each team. (b) The impact of varying temperature parameter $\tau$ (0.1, 0.15, 0.2) in Algorithm \ref{['tab:algcla']} on the closeness to TNE. (c) The effect of different $\delta$ values (0.2, 0.5) in (Independent) Algorithm \ref{['tab:algcla']} on the convergence speed with $\tau$ fixed at 0.1
  • ...and 2 more figures

Theorems & Definitions (12)

  • Definition 2.1: Zero-sum Potential Team Game
  • Example 2.2
  • Remark 2.3: General-sum ZSPTGs
  • Definition 2.4: Team-Nash Gap
  • Remark 3.1: Scalability
  • Theorem 4.2
  • Corollary 4.3
  • Lemma A.1
  • Proposition A.2
  • Proposition A.3
  • ...and 2 more