Properties of Winning Iterated Prisoner's Dilemma Strategies
Nikoleta E. Glynatsi, Vincent Knight, Marc Harper
TL;DR
This study evaluates 195 IPD strategies across four tournament types to test claims of a single best strategy. Using large-scale, diverse tournaments and a normalization scheme for ranking, it shows that no strategy dominates all settings; performance correlates with environment-sensitive features such as provocation, cooperation relative to the population mean, and adaptability. The results refine Axelrod's classic guidance by showing that clever, adaptive, and population-aware strategies—especially those trained for noise—achieve robust success, while zero-determinant strategies generally underperform in population-level tournaments. The findings have practical implications for training autonomous agents: exposing strategies to diverse opponents and environments yields more generalizable behavior and highlights key features to encode in agents' decision policies.
Abstract
Researchers have explored the performance of Iterated Prisoner's Dilemma strategies for decades, from the celebrated performance of Tit for Tat to the introduction of the zero-determinant strategies and the use of sophisticated learning structures such as neural networks. Many new strategies have been introduced and tested in a variety of tournaments and population dynamics. Typical results in the literature, however, rely on performance against a small number of somewhat arbitrarily selected strategies in a small number of tournaments, casting doubt on the generalizability of conclusions. In this work, we analyze a large collection of 195 strategies in thousands of computer tournaments, present the top performing strategies across multiple tournament types, and distill their salient features. The results show that there is not yet a single strategy that performs well in diverse Iterated Prisoner's Dilemma scenarios, nevertheless there are several properties that heavily influence the best performing strategies. This refines the properties described by Axelrod in light of recent and more diverse opponent populations to: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. More precisely, we find that strategies perform best when their probability of cooperation matches the total tournament population's aggregate cooperation probabilities. The features of high performing strategies help cast some light on why strategies such as Tit For Tat performed historically well in tournaments and why zero-determinant strategies typically do not fare well in tournament settings. Furthermore, our findings have implications for the future training of autonomous agents, as understanding the crucial features for incorporation into these agents becomes essential.
