Table of Contents
Fetching ...

Best-Response dynamics in two-person random games with correlated payoffs

Hlafo Alfie Mimun, Matteo Quattropani, Marco Scarsini

TL;DR

The paper introduces a parametric model of two-player random games where the second player's payoff agrees with the first with probability $p$, yielding a continuum between i.i.d. payoffs ($p=0$) and random potential games ($p=1$). It derives the exact and asymptotic behavior of the expected number of pure Nash equilibria, showing Poisson$(1)$ in the i.i.d. case and divergence for any $p>0$, with refined limits that depend on action-set sizes. It then analyzes best-response dynamics, proving that BRD converges to a PNE with high probability when $p>0$, and providing detailed stopping-time distributions in the potential-game and i.i.d. regimes, along with bounds on convergence times in the general case. The study reveals a phase transition at $p=0$ and offers a framework for understanding learning dynamics in structured random games, with potential extensions to more players and alternative BRD variants.

Abstract

We consider finite two-player normal form games with random payoffs. Player A's payoffs are i.i.d. from a uniform distribution. Given p in [0, 1], for any action profile, player B's payoff coincides with player A's payoff with probability p and is i.i.d. from the same uniform distribution with probability 1-p. This model interpolates the model of i.i.d. random payoff used in most of the literature and the model of random potential games. First we study the number of pure Nash equilibria in the above class of games. Then we show that, for any positive p, asymptotically in the number of available actions, best response dynamics reaches a pure Nash equilibrium with high probability.

Best-Response dynamics in two-person random games with correlated payoffs

TL;DR

The paper introduces a parametric model of two-player random games where the second player's payoff agrees with the first with probability , yielding a continuum between i.i.d. payoffs () and random potential games (). It derives the exact and asymptotic behavior of the expected number of pure Nash equilibria, showing Poisson in the i.i.d. case and divergence for any , with refined limits that depend on action-set sizes. It then analyzes best-response dynamics, proving that BRD converges to a PNE with high probability when , and providing detailed stopping-time distributions in the potential-game and i.i.d. regimes, along with bounds on convergence times in the general case. The study reveals a phase transition at and offers a framework for understanding learning dynamics in structured random games, with potential extensions to more players and alternative BRD variants.

Abstract

We consider finite two-player normal form games with random payoffs. Player A's payoffs are i.i.d. from a uniform distribution. Given p in [0, 1], for any action profile, player B's payoff coincides with player A's payoff with probability p and is i.i.d. from the same uniform distribution with probability 1-p. This model interpolates the model of i.i.d. random payoff used in most of the literature and the model of random potential games. First we study the number of pure Nash equilibria in the above class of games. Then we show that, for any positive p, asymptotically in the number of available actions, best response dynamics reaches a pure Nash equilibrium with high probability.
Paper Structure (17 sections, 15 theorems, 140 equations, 3 figures)

This paper contains 17 sections, 15 theorems, 140 equations, 3 figures.

Key Result

Proposition 3.1

If $W$ is the number of PNE in the game $\boldsymbol{U}(p)$, then

Figures (3)

  • Figure 1: Both figures show an instance of the first seven steps in the BRD. The figure on the left describes the case in which $\tau^{\mathop{\mathrm{\mathsf{NE}}}\nolimits}_n=6$. To compute $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(7)$, the row player visits the action profiles on the red dashed lines and finds the maximum payoff at $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(6)$; hence $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(6)\in\mathop{\mathrm{\mathsf{NE}}}\nolimits_n$. In this case $R_n(5)$ consists of all the action profiles on the solid red and blue lines in the figure. Hence, $\tau^{R}_n=7$ and, consequently, $\tau^{R}_n-1=\tau^{\mathop{\mathrm{\mathsf{NE}}}\nolimits}_n$. The figure on the right describes the case in which the BRD discovers a trap. Since $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(6)$ and $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(3)$ are in the same column, we have $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(7)=\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(3)$. In this case $\tau^{R}_n=6$ because $R_n(4)$ consists of all the action profiles on the solid lines in the figure, except for the blue line passing through the action profiles labeled $5$ and $6$. Hence, $\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(\tau^{R}_n+1)=\mathop{\mathrm{\mathsf{BRD}}}\nolimits_n(t)$ for $t=3\leq \tau^{R}_n-3$.
  • Figure 2: The figure on the left shows an instance of the first five steps of the BRD, whereas the one on the right considers an additional step. Given the position of $\mathop{\mathrm{\mathsf{BRD}}}\nolimits(t)$ for $t=1,\ldots,5$ in the figure on the left, $D_{n}(5)$ coincides with the event that the payoff of player $\mathrm{B}$ at $5$ is not the maximum of its row (the blue dashed line). Conditioning on $D_{n}(5)$, the probability of $D_{n}(6)$ is the product between the probability that $\mathop{\mathrm{\mathsf{BRD}}}\nolimits(6)$ is not at the action profiles $M_1$ and $M_2$, that is $(K_{n}^{\mathrm{B}}-3)/(K_{n}^{\mathrm{B}}-1)$, and, given this, the probability that the payoff of player $\mathrm{A}$ at $6$ is not the maximum of its column (the red dashed line), i.e., $(K_{n}^{\mathrm{A}}-1)/K_{n}^{\mathrm{A}}$. This explains \ref{['eq:P-Bt']} when $t=6$. Similarly, conditioning on $D_{n}(5)$, $C_{n}(6)$ is the intersection between two events: the first one is that the position of $\mathop{\mathrm{\mathsf{BRD}}}\nolimits(6)$ does not coincide with $M_1$ and $M_2$, which has probability $(K_{n}^{\mathrm{B}}-3)/(K_{n}^{\mathrm{B}}-1)$; the second one is that the payoff of player $\mathrm{A}$ at $6$ is the maximum of its column (the red dashed line), which, conditioning on the first event, has probability $1/K_{n}^{\mathrm{A}}$. This justifies \ref{['eq:P-Ct-Bt-1']} when $t=6$.
  • Figure 3: The left figure shows an instance of the first four steps of the BRD. The right figure shows how the dynamics proceeds after time $4$. The numbered action profiles lying on the dashed lines, i.e., $5$, $6$, and $7$, give the same payoff to the row and column player. Note that in this case the event $J_{n}^{s_n,\ell_{n}}$ occurs with $\ell_{n}=7$ and $s_n=3$. Indeed, there exists $t\leq \ell_n-s_n$ (in this case $t=4$) such that the BRD visits only action profiles in $S_{n}$ for $s_n$ consecutive steps. As a consequence, the payoff of the row player at the action profile $7$ is the maximum of the payoffs of the row player in the action profiles lying on the red dashed lines and of the payoffs of the column player in the action profiles lying on the blue dashed lines. Such payoffs are all i.i.d. $\mathop{\mathrm{\mathsf{Unif}}}\nolimits([0,1])$ except for the payoffs associated to the action profiles $4,V_1,\ldots,V_5$, for which we have additional information. Since the number of such exceptional action profiles is at most $\left(\ell_n/2\right)^{2}$ and the total number of action profiles lying on the dashed lines is $r_n(s_n)$, we have that the payoff of the row player at the action profile $7$ is the maximum of at least $u$ i.i.d. $\mathop{\mathrm{\mathsf{Unif}}}\nolimits([0,1])$ random variables, where $u$ is defined as in \ref{['eq:def-u']}.

Theorems & Definitions (29)

  • Definition 2.1
  • Proposition 3.1
  • Proposition 3.2
  • Corollary 3.3
  • Lemma 4.1
  • Lemma 4.2
  • Theorem 4.3
  • Proposition 4.4
  • Theorem 4.5
  • Theorem 4.6
  • ...and 19 more