Table of Contents
Fetching ...

Properties of Winning Iterated Prisoner's Dilemma Strategies

Nikoleta E. Glynatsi, Vincent Knight, Marc Harper

TL;DR

This study evaluates 195 IPD strategies across four tournament types to test claims of a single best strategy. Using large-scale, diverse tournaments and a normalization scheme for ranking, it shows that no strategy dominates all settings; performance correlates with environment-sensitive features such as provocation, cooperation relative to the population mean, and adaptability. The results refine Axelrod's classic guidance by showing that clever, adaptive, and population-aware strategies—especially those trained for noise—achieve robust success, while zero-determinant strategies generally underperform in population-level tournaments. The findings have practical implications for training autonomous agents: exposing strategies to diverse opponents and environments yields more generalizable behavior and highlights key features to encode in agents' decision policies.

Abstract

Researchers have explored the performance of Iterated Prisoner's Dilemma strategies for decades, from the celebrated performance of Tit for Tat to the introduction of the zero-determinant strategies and the use of sophisticated learning structures such as neural networks. Many new strategies have been introduced and tested in a variety of tournaments and population dynamics. Typical results in the literature, however, rely on performance against a small number of somewhat arbitrarily selected strategies in a small number of tournaments, casting doubt on the generalizability of conclusions. In this work, we analyze a large collection of 195 strategies in thousands of computer tournaments, present the top performing strategies across multiple tournament types, and distill their salient features. The results show that there is not yet a single strategy that performs well in diverse Iterated Prisoner's Dilemma scenarios, nevertheless there are several properties that heavily influence the best performing strategies. This refines the properties described by Axelrod in light of recent and more diverse opponent populations to: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. More precisely, we find that strategies perform best when their probability of cooperation matches the total tournament population's aggregate cooperation probabilities. The features of high performing strategies help cast some light on why strategies such as Tit For Tat performed historically well in tournaments and why zero-determinant strategies typically do not fare well in tournament settings. Furthermore, our findings have implications for the future training of autonomous agents, as understanding the crucial features for incorporation into these agents becomes essential.

Properties of Winning Iterated Prisoner's Dilemma Strategies

TL;DR

This study evaluates 195 IPD strategies across four tournament types to test claims of a single best strategy. Using large-scale, diverse tournaments and a normalization scheme for ranking, it shows that no strategy dominates all settings; performance correlates with environment-sensitive features such as provocation, cooperation relative to the population mean, and adaptability. The results refine Axelrod's classic guidance by showing that clever, adaptive, and population-aware strategies—especially those trained for noise—achieve robust success, while zero-determinant strategies generally underperform in population-level tournaments. The findings have practical implications for training autonomous agents: exposing strategies to diverse opponents and environments yields more generalizable behavior and highlights key features to encode in agents' decision policies.

Abstract

Researchers have explored the performance of Iterated Prisoner's Dilemma strategies for decades, from the celebrated performance of Tit for Tat to the introduction of the zero-determinant strategies and the use of sophisticated learning structures such as neural networks. Many new strategies have been introduced and tested in a variety of tournaments and population dynamics. Typical results in the literature, however, rely on performance against a small number of somewhat arbitrarily selected strategies in a small number of tournaments, casting doubt on the generalizability of conclusions. In this work, we analyze a large collection of 195 strategies in thousands of computer tournaments, present the top performing strategies across multiple tournament types, and distill their salient features. The results show that there is not yet a single strategy that performs well in diverse Iterated Prisoner's Dilemma scenarios, nevertheless there are several properties that heavily influence the best performing strategies. This refines the properties described by Axelrod in light of recent and more diverse opponent populations to: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. More precisely, we find that strategies perform best when their probability of cooperation matches the total tournament population's aggregate cooperation probabilities. The features of high performing strategies help cast some light on why strategies such as Tit For Tat performed historically well in tournaments and why zero-determinant strategies typically do not fare well in tournament settings. Furthermore, our findings have implications for the future training of autonomous agents, as understanding the crucial features for incorporation into these agents becomes essential.

Paper Structure

This paper contains 9 sections, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Examples of normalized rank distributions for two strategies, TFT and Gradual. We plot the distributions of $r$ for the two strategies in the four tournament types. As a reminder, lower values of $r$ correspond to better performances. The top left quadrant of each plot shows the distribution for standard tournaments (fixed number of turns and no noise). The top right quadrant shows the distribution for noisy tournaments (fixed number of turns and noise). The bottom left quadrant shows the distribution for probabilistic ending tournaments (no noise and probabilistic ending). Finally, the bottom right quadrant shows the distribution for noisy probabilistic ending tournaments (noise and probabilistic ending). In each quadrant, we also show the number of data points. Both strategies participated in a similar number of tournaments. Based on the median rank, which we use in this work to define overall performance, TFT performs best in probabilistic ending tournaments, whereas Gradual was in standard tournaments.
  • Figure 2: $r$ distributions of the top 15 strategies in different environments. A lower value of $\bar{r}$ corresponds to a more successful performance. A strategy's $r$ distribution skewed towards zero indicates that the strategy ranked highly in most tournaments it participated in. Most distributions are skewed towards zero.
  • Figure 3: Distributions of $C_r$ and $C_r / C_{\text{mean}}$ for the winners of tournaments. A value of $C_r / C_{\text{mean}} = 1$ imply that the cooperating ratio of the winner was the same as the mean cooperating ratio of the tournament.
  • Figure 4: Distributions of SSE error for the winners of tournaments. As a reminder, the SSE error shows how close a strategy is to behaving as a ZDs, and subsequently, in an extortionate way. A SSE value of 1 indicates no extortionate behaviour at all whereas a value of 0 indicates that a strategy is behaving as a ZDs.
  • Figure 5: Distributions of rates $CC$ to $C$, $CD$ to $C$, $DC$ to $C$, and $DD$ to $C$ for the winners of tournaments.