Table of Contents
Fetching ...

Expectation-enforcing strategies for repeated games

Nikos Dimou, Alex McAvoy

TL;DR

This work provides a complete characterization of payoff relationships that a single player can enforce in discounted two-player repeated games. It proves that any enforceable relationship—linear or nonlinear—can be implemented by a simple two-point reactive learning strategy, even when the opponent uses highly sophisticated strategies; longer memory adds no extra enforcement power. A generalized next-round correction condition yields necessary and sufficient criteria for enforceability, and the minimum discount factor for enforcement is computable in polynomial time via linear programming. The results apply to classic settings like the iterated prisoner's dilemma and extend to nonlinear and asymmetric donation games and hawk-dove dynamics, clarifying when equality, fairness, or extortionate constraints can be unilateral enforced. The findings also imply practical tools for designing coercive or cooperative incentives in multi-agent learning and coalition settings, with implications for climate policy, algorithmic collusion, and evolutionary dynamics.

Abstract

Originating in evolutionary game theory, the class of "zero-determinant" strategies enables a player to unilaterally enforce linear payoff relationships in simple repeated games. An upshot of this kind of payoff constraint is that it can shape the incentives for the opponent in a predetermined way. An example is when a player ensures that the agents get equal payoffs. While extensively studied in infinite-horizon games, extensions to discounted games, nonlinear payoff relationships, richer strategic environments, and behaviors with long memory remain incompletely understood. In this paper, we provide necessary and sufficient conditions for a player to enforce arbitrary payoff relationships (linear or nonlinear), in expectation, in discounted games. These conditions characterize precisely which payoff relationships are enforceable using strategies of arbitrary complexity. Our main result establishes that any such enforceable relationship can actually be implemented using a simple two-point reactive learning strategy, which conditions on the opponent's most recent action and the player's own previous mixed action, using information from only one round into the past. For additive payoff constraints, we show that enforcement is possible using even simpler (reactive) strategies that depend solely on the opponent's last move. In other words, this tractable class is universal within expectation-enforcing strategies. As examples, we apply these results to characterize extortionate, generous, equalizer, and fair strategies in the iterated prisoner's dilemma, asymmetric donation game, nonlinear donation game, and the hawk-dove game, identifying precisely when each class of strategy is enforceable and with what minimum discount factor.

Expectation-enforcing strategies for repeated games

TL;DR

This work provides a complete characterization of payoff relationships that a single player can enforce in discounted two-player repeated games. It proves that any enforceable relationship—linear or nonlinear—can be implemented by a simple two-point reactive learning strategy, even when the opponent uses highly sophisticated strategies; longer memory adds no extra enforcement power. A generalized next-round correction condition yields necessary and sufficient criteria for enforceability, and the minimum discount factor for enforcement is computable in polynomial time via linear programming. The results apply to classic settings like the iterated prisoner's dilemma and extend to nonlinear and asymmetric donation games and hawk-dove dynamics, clarifying when equality, fairness, or extortionate constraints can be unilateral enforced. The findings also imply practical tools for designing coercive or cooperative incentives in multi-agent learning and coalition settings, with implications for climate policy, algorithmic collusion, and evolutionary dynamics.

Abstract

Originating in evolutionary game theory, the class of "zero-determinant" strategies enables a player to unilaterally enforce linear payoff relationships in simple repeated games. An upshot of this kind of payoff constraint is that it can shape the incentives for the opponent in a predetermined way. An example is when a player ensures that the agents get equal payoffs. While extensively studied in infinite-horizon games, extensions to discounted games, nonlinear payoff relationships, richer strategic environments, and behaviors with long memory remain incompletely understood. In this paper, we provide necessary and sufficient conditions for a player to enforce arbitrary payoff relationships (linear or nonlinear), in expectation, in discounted games. These conditions characterize precisely which payoff relationships are enforceable using strategies of arbitrary complexity. Our main result establishes that any such enforceable relationship can actually be implemented using a simple two-point reactive learning strategy, which conditions on the opponent's most recent action and the player's own previous mixed action, using information from only one round into the past. For additive payoff constraints, we show that enforcement is possible using even simpler (reactive) strategies that depend solely on the opponent's last move. In other words, this tractable class is universal within expectation-enforcing strategies. As examples, we apply these results to characterize extortionate, generous, equalizer, and fair strategies in the iterated prisoner's dilemma, asymmetric donation game, nonlinear donation game, and the hawk-dove game, identifying precisely when each class of strategy is enforceable and with what minimum discount factor.

Paper Structure

This paper contains 28 sections, 26 theorems, 85 equations, 6 figures.

Key Result

Theorem 0

Suppose that $\left(\sigma_{X}^{0},\sigma_{X}\left[s_{X},s_{Y}\right]\right)$ is a memory-one strategy for $X$. If there exists a function $\psi:S_{X}\rightarrow\mathbb{R}$ such that holds for every $s_{X}\in S_{X}$ and $s_{Y}\in S_{Y}$, then $\left(\sigma_{X}^{0},\sigma_{X}\left[s_{X},s_{Y}\right]\right)$ enforces the linear payoff relationship against any behavioral strategy of player $Y$, inc

Figures (6)

  • Figure 1: Payoff regions enforced when $X$ plays weighted averages of ALLD (red) and TFT (green) in repeated prisoner's dilemmas. Each colored region shows the payoff region obtained from the strategy $\sigma_{X}\coloneqq\left(1-p\right)\textrm{ALLD}+p\textrm{TFT}$ played against $10^{4}$ randomly-chosen opposing strategies, for $p\in\left\{k/10\right\}_{k=0}^{10}$. (A,C) Additive prisoner's dilemma with $\left(R,S,T,P\right)=\left(1,-1,2,0\right)$: for $p\notin\left\{0,1\right\}$, the strategy enforces a linear payoff relationship. (B,D) Non-additive prisoner's dilemma with $\left(R,S,T,P\right)=\left(3,0,5,1\right)$: the strategy enforces a two-dimensional convex region. While line-enforcing strategies naturally arise in additive games through simple mixtures of well-known strategies, non-additive games require more sophisticated constructions. Panels A and B use $\lambda =0.9999$ (a game with $10{,}000$ rounds, on average, approximating an undiscounted game), and panels C and D use $\lambda =0.8$ (a game with $5$ rounds, on average).
  • Figure 2: Pencil of enforceable lines in the donation game. The boundary lines $cu_{X}=-bu_{Y}$ (enforced by ALLD) and $cu_{X}=-bu_{Y}+b^{2}-c^{2}$ (enforced by ALLC) are shown, along with intermediate parallel lines that can be enforced by convex combinations of these strategies. By Proposition \ref{['prop:convexproperty2']}, an agent can enforce any line in this family by appropriately mixing between punishment and forgiveness. The convex hull of all such enforceable lines equals the entire payoff region, demonstrating complete unilateral control over expected payoff outcomes in the repeated donation game.
  • Figure 3: Heat map on enforceability of the linear payoff relationship $\varphi =\kappa -u_{X}\left(s_{X},s_{Y}\right) -\chi\left(\kappa -u_{Y}\left(s_{X},s_{Y}\right)\right)\equiv 0$ for all values $\kappa ,\chi\in\mathbb{R}$, for the repeated prisoner's dilemma with $\left(R,S,T,P\right) =\left(3,0,5,1\right)$. $\theta$ is the angle between the payoff relationship (green) and the reference line (dashed), and $r$ represents the fraction (red) of the reference line made up by the intersection point. The region enclosed by the white dashed line is the set of $\left(r,\theta\right)$ for which at least one of $\tau_{X}^{+}$ and $\tau_{X}^{-}$ is non-pure in the optimizer for $\lambda_{\min}$ (Eq. \ref{['eq:lambda_min']}). For this particular game, we have $\kappa =P+r\left(R-P\right)$ and $\chi =\tan\left(\theta +\pi /4\right)$.
  • Figure 4: Enforceability of linear payoff relationships in the three-action nonlinear donation game with parameters $b_{1}=3$, $c_{1}=1$, $b_{2}=4$, $c_{2}=2.5$ (satisfying $b_{2}-c_{2}<b_{1}-c_{1}$). The heatmap shows the minimum discount factor $\lambda_{\textrm{min}}$ required to enforce $\varphi =\kappa -u_{X} -\chi\left(\kappa -u_{Y}\right)\equiv 0$ across different values of $\kappa$ and $\chi$. Due to the game's additive structure, any enforceable relationship can be implemented using a two-point reactive strategy, significantly simplifying the strategy space compared to general memory-one approaches.
  • Figure 5: Enforceability in the asymmetric donation game with $b_{X}=3$, $c_{X}=1$, $b_{Y}=2$, and $c_{Y}=1$. The heatmap displays the minimum discount factor $\lambda_{\textrm{min}}$ for enforcing $\varphi =\kappa -u_{X} -\chi\left(\kappa -u_{Y}\right)\equiv 0$. The natural reference line (indicated by the dashed line in the payoff space) connects mutual defection $\left(0,0\right)$ to mutual cooperation $\left(b_{Y}-c_{X},b_{X}-c_{Y}\right)$, reflecting the asymmetric costs and benefits. Strategies enforcing equality ($\pi_{X}=\pi_{Y}$, corresponding to $\chi =1$) require $\lambda\rightarrow 1$, while "fair" strategies that enforce proportional sharing along the reference line are more readily achievable.
  • ...and 1 more figures

Theorems & Definitions (58)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 0: mcavoy:PNAS:2016
  • Example 1: Tit-for-tat in the undiscounted prisoner's dilemma
  • Example 2
  • Lemma 1
  • proof
  • Proposition 1
  • ...and 48 more