Rationality of Learning Algorithms in Repeated Normal-Form Games

Shivam Bajaj; Pranoy Das; Yevgeniy Vorobeychik; Vijay Gupta

Rationality of Learning Algorithms in Repeated Normal-Form Games

Shivam Bajaj, Pranoy Das, Yevgeniy Vorobeychik, Vijay Gupta

TL;DR

This work addresses whether learning algorithms used by self-interested agents in two-agent repeated normal-form games can be robust to deviations. It formalizes the notion of $c$-rationality via the rationality ratio $s(\mathcal{A}',\mathcal{A})=\frac{U_1(\mathcal{A}',\mathcal{A})}{U_1(\mathcal{A},\mathcal{A})}$ and shows that classic learners like fictitious play and regret matching are not $c$-rational for any fixed $c\ge1$. It then introduces two algorithms, Rational Generalized Fictitious Play (R-GFP) and Rational Regret Matching (R-RM), which are provably perfectly rational ($c=1$) under mild monitoring assumptions by coupling self-play with a deviation-detection mechanism and a minimax-based punishment; exploration phases reveal payoffs while exploitation follows GFP/RM. Numerical experiments corroborate the theoretical results and illustrate the necessity of perfect monitoring, as imperfect monitoring can destroy the existence of $c$-rational algorithms in some games, with implications for reducing exploitability in strategic multi-agent settings.

Abstract

Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rationality ratio, which is the ratio of the highest payoff an agent can obtain by deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be $c$-rational if its rationality ratio is at most $c$ irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not $c$-rational for any constant $c\geq 1$. We then propose and analyze two algorithms that are provably $1$-rational under mild assumptions, and have the same properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which $c$-rational algorithms do not exist, and illustrate our results with numerical case studies.

Rationality of Learning Algorithms in Repeated Normal-Form Games

TL;DR

This work addresses whether learning algorithms used by self-interested agents in two-agent repeated normal-form games can be robust to deviations. It formalizes the notion of

-rationality via the rationality ratio

and shows that classic learners like fictitious play and regret matching are not

-rational for any fixed

. It then introduces two algorithms, Rational Generalized Fictitious Play (R-GFP) and Rational Regret Matching (R-RM), which are provably perfectly rational (

) under mild monitoring assumptions by coupling self-play with a deviation-detection mechanism and a minimax-based punishment; exploration phases reveal payoffs while exploitation follows GFP/RM. Numerical experiments corroborate the theoretical results and illustrate the necessity of perfect monitoring, as imperfect monitoring can destroy the existence of

-rational algorithms in some games, with implications for reducing exploitability in strategic multi-agent settings.

Abstract

-rational if its rationality ratio is at most

irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not

-rational for any constant

. We then propose and analyze two algorithms that are provably

-rational under mild assumptions, and have the same properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which

-rational algorithms do not exist, and illustrate our results with numerical case studies.

Paper Structure (16 sections, 7 theorems, 27 equations, 12 figures, 2 tables, 3 algorithms)

This paper contains 16 sections, 7 theorems, 27 equations, 12 figures, 2 tables, 3 algorithms.

Introduction
Model and Definitions
Irrationality of existing self-play algorithms
Rational Learning Algorithms
Rational Generalized Fictitious Play
Rational Regret Matching
Numerical Results
Conclusion
Impact Statement
Proof of Theorem \ref{['thm:BR_not_secure']}
Proof of Theorem \ref{['thm:RM_not_secure']}
Proof of Lemma \ref{['lem:punishment']}
Proof of Theorem \ref{['thm:Sec_FM']}
Proof of Theorem \ref{['thm:Sec_RM']}
Proof of Theorem \ref{['thm:imperfect_games']}
...and 1 more sections

Key Result

Theorem 3.1

Fictitious play algorithm is not $c$-rational for any given constant $c\geq 1$.

Figures (12)

Figure 1: Illustration of two player stage games.
Figure 2: Construction of matrix $E_2^t$ in Algorithm R-GFP at times $t=2$ and $t=3$.
Figure 3: Construction of $E_2^t$ at epoch $t=2$ and $t=4$ for Algorithm R-RM.
Figure 4: Numerical plot illustrating value of agent $1$ (adversary) over time.
Figure 5: Game used for the numerical results in Figure \ref{['fig:RFP_plot']}.
...and 7 more figures

Theorems & Definitions (18)

Definition 2.1: Stage Game
Definition 2.2: Nash Equilibrium
Definition 2.3: Rationality Ratio
Theorem 3.1
Theorem 3.2
Definition 4.1
Lemma 4.2
Definition 4.3: Generalized Fictitious Play (GFP)
Theorem 4.4
Theorem 4.5
...and 8 more

Rationality of Learning Algorithms in Repeated Normal-Form Games

TL;DR

Abstract

Rationality of Learning Algorithms in Repeated Normal-Form Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (18)