Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Ahmed Said Donmez; Muhammed O. Sayin

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Ahmed Said Donmez, Muhammed O. Sayin

TL;DR

This paper presents the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents, and demonstrates a faster convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games.

Abstract

This paper addresses the challenge of limited observations in non-cooperative multi-agent systems where agents can have partial access to other agents' actions. We present the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents. This approach leverages access to opponents' actions whenever possible, demonstrably achieving a faster (guaranteed) convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games. Notably, the dynamics reduce to the well-studied smoothed fictitious play and individual Q-learning under full and no access to opponent actions, respectively. We further quantify the improvement in convergence rate due to observing opponents' actions through numerical simulations.

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

TL;DR

Abstract

Paper Structure (8 sections, 46 equations, 4 figures, 1 algorithm)

This paper contains 8 sections, 46 equations, 4 figures, 1 algorithm.

INTRODUCTION
Preliminary: Polymatrix Games
Generalized Individual-Q Learning
Convergence Results
Illustrative Examples
Conclusion
Proof of Proposition \ref{['prop:zerosum']}
Proof of Proposition \ref{['prop:V_p']}

Figures (4)

Figure 1: An illustration of observability and interconnectedness over two layers. Nodes connected by dashed lines across layers represent an agent. In Layer 1, observability is depicted, where directed edges indicate that agents can observe the actions of others, forming the observability graph. In Layer 2, agents interact and can affect each other's payoffs, as represented by the directed edges, forming the interaction graph.
Figure 2: Convergence of the QRE-gap (left), and $q_{\mathrm{diff}}$ (right) for different values of edge-connection probability $p$, in the potential polymatrix games. The solid curves represent the mean over $50$ independent trials and the shaded areas are $\pm 0.5$ standard deviations from the mean values.
Figure 3: Convergence of the QRE-gap (left), and $q_{\mathrm{diff}}$ (right) for different values of edge-connection probability $p$, in the zero-sum polymatrix games. The solid curves represent the mean over $50$ independent trials and the shaded areas are $\pm 0.5$ standard deviations from the mean values.
Figure 4: An illustration of $\pi^i_k$'s over the probability simplex $\Delta^i\subset\mathbb{R}^3$ for each agent $i$ at the end of $10^6$ iterations and over $50$ independent trials for $p=1$. Green and red dots are, resp., for the potential and zero-sum games. The shaded areas represent the standard deviations from the mean values.

Theorems & Definitions (1)

proof

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

TL;DR

Abstract

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)