Expectation-enforcing strategies for repeated games
Nikos Dimou, Alex McAvoy
TL;DR
This work provides a complete characterization of payoff relationships that a single player can enforce in discounted two-player repeated games. It proves that any enforceable relationship—linear or nonlinear—can be implemented by a simple two-point reactive learning strategy, even when the opponent uses highly sophisticated strategies; longer memory adds no extra enforcement power. A generalized next-round correction condition yields necessary and sufficient criteria for enforceability, and the minimum discount factor for enforcement is computable in polynomial time via linear programming. The results apply to classic settings like the iterated prisoner's dilemma and extend to nonlinear and asymmetric donation games and hawk-dove dynamics, clarifying when equality, fairness, or extortionate constraints can be unilateral enforced. The findings also imply practical tools for designing coercive or cooperative incentives in multi-agent learning and coalition settings, with implications for climate policy, algorithmic collusion, and evolutionary dynamics.
Abstract
Originating in evolutionary game theory, the class of "zero-determinant" strategies enables a player to unilaterally enforce linear payoff relationships in simple repeated games. An upshot of this kind of payoff constraint is that it can shape the incentives for the opponent in a predetermined way. An example is when a player ensures that the agents get equal payoffs. While extensively studied in infinite-horizon games, extensions to discounted games, nonlinear payoff relationships, richer strategic environments, and behaviors with long memory remain incompletely understood. In this paper, we provide necessary and sufficient conditions for a player to enforce arbitrary payoff relationships (linear or nonlinear), in expectation, in discounted games. These conditions characterize precisely which payoff relationships are enforceable using strategies of arbitrary complexity. Our main result establishes that any such enforceable relationship can actually be implemented using a simple two-point reactive learning strategy, which conditions on the opponent's most recent action and the player's own previous mixed action, using information from only one round into the past. For additive payoff constraints, we show that enforcement is possible using even simpler (reactive) strategies that depend solely on the opponent's last move. In other words, this tractable class is universal within expectation-enforcing strategies. As examples, we apply these results to characterize extortionate, generous, equalizer, and fair strategies in the iterated prisoner's dilemma, asymmetric donation game, nonlinear donation game, and the hawk-dove game, identifying precisely when each class of strategy is enforceable and with what minimum discount factor.
