Table of Contents
Fetching ...

Evolution of noisy learning in games

Marta C. Couto, Fernando P. Santos, Christian Hilbe

Abstract

People make strategic decisions many times a day - during negotiations, when coordinating actions with others, or when choosing partners for cooperation. The resulting dynamics can be studied with learning theory and evolutionary game theory. These frameworks explore how people adapt their decisions over time, in light of how effective their strategies have been. The outcomes of such learning processes depend on how sensitive individuals are to the performance of their strategies. When they are more sensitive, they systematically favor strategies they deem more successful. When they are less sensitive, their learning process is noisier and more erratic. Traditionally, most models treat this sensitivity as a fixed parameter - like the "selection strength" parameter in evolutionary models. Instead, we study how strategies and sensitivities co-evolve. We find that the co-evolutionary endpoints depend on both the type of strategic interaction and the learning rule employed. In prisoner's dilemmas, we often observe sensitivities to increase indefinitely. But in snowdrift and stag-hunt games, sensitivities often converge to a finite value, or we observe evolutionary branching altogether. These results shed light on how evolution might shape learning mechanisms for social behavior. They suggest that noisy learning does not need to be a by-product of cognitive constraints. Instead, it can serve as a means to gain strategic advantages.

Evolution of noisy learning in games

Abstract

People make strategic decisions many times a day - during negotiations, when coordinating actions with others, or when choosing partners for cooperation. The resulting dynamics can be studied with learning theory and evolutionary game theory. These frameworks explore how people adapt their decisions over time, in light of how effective their strategies have been. The outcomes of such learning processes depend on how sensitive individuals are to the performance of their strategies. When they are more sensitive, they systematically favor strategies they deem more successful. When they are less sensitive, their learning process is noisier and more erratic. Traditionally, most models treat this sensitivity as a fixed parameter - like the "selection strength" parameter in evolutionary models. Instead, we study how strategies and sensitivities co-evolve. We find that the co-evolutionary endpoints depend on both the type of strategic interaction and the learning rule employed. In prisoner's dilemmas, we often observe sensitivities to increase indefinitely. But in snowdrift and stag-hunt games, sensitivities often converge to a finite value, or we observe evolutionary branching altogether. These results shed light on how evolution might shape learning mechanisms for social behavior. They suggest that noisy learning does not need to be a by-product of cognitive constraints. Instead, it can serve as a means to gain strategic advantages.

Paper Structure

This paper contains 10 sections, 8 equations, 4 figures.

Figures (4)

  • Figure 1: An overview of the model.A, In stochastic models of evolutionary game theory, individuals continually get a chance to revise their strategy. We model this revision process with introspection dynamics Couto:NJP:2022Couto:DGAA:2023Hauser:Nature:2019McAvoy:PNASnexus:2022Wang:PTRSB:2023huebner:PNAS:2024Schmid:PlosCB:2022Ramirez:SciRep:2023. Here, a player compares its current strategy to a randomly chosen alternative. If $\pi$ is the payoff of the current strategy, and $\pi'$ the payoff of the alternative, the player's switching probability is given by a Fermi function $\phi(\pi' \!-\! \pi)$. This implies players are more likely to switch if the alternative strategy is more beneficial. B, The exact shape of the switching probability depends on a parameter $\beta$. This parameter is often referred to as the strength of selection. Because here we interpret $\beta$ as an individual (and evolvable) trait, we refer to it as the player's payoff sensitivity. For low $\beta$, the player's learning process is more noisy. Here, strategy changes are mostly driven by chance. As $\beta$ becomes larger, updating decisions become more deterministic. Here, individuals increasingly favor those alternative strategies with high payoffs. We are interested in the evolution of this parameter $\beta$. Our process unfolds on two timescales. C, In the short run, each player's payoff sensitivity $\beta$ is fixed. Given their sensitivity, they choose between different strategies in a stage game. The depicted example shows a round in which player 1 switches from strategy C (cooperation) to strategy D (defection). We iterate this learning process for many rounds. Based on these iterations, we compute the average frequency of each possible game outcome (illustrated as a black-and-white gradient). This allows us to compute the players' expected payoff as a function of their $\beta$ values. D, In the long-run, we let the players' payoff sensitivity $\beta$ evolve. We model this long-run process with adaptive dynamics Geritz:PRL:1997geritz:EER:1998Hofbauer:book:1998Brannstrom:Games:2013.
  • Figure 2: Introspection dynamics among players with different payoff sensitivities. To illustrate the impact of payoff sensitivity on the learning dynamics, we consider two stage games, a prisoner's dilemma (A--C) and a snowdrift game (D--F). In each case, we assume payoff sensitivities may take two possible values. They can either be comparably low (bright avatar) or high (dark avatar). For all possible combinations of payoff sensitivities, we depict how often the respective players would obtain one of the four possible payoffs of the respective stage game. Based on this stationary distribution, we compute the players' expected payoffs. For the prisoner's dilemma, we observe that the higher payoff sensitivity dominates the lower payoff sensitivity (indicated by blue arrows in panel D). In contrast, for the snowdrift game, no payoff sensitivity value is dominant. Instead, each player prefers to have the opposite payoff sensitivity value of their opponent.
  • Figure 3: Adaptive dynamics of payoff sensitivity for three different social dilemmas. We explore the adaptive dynamics for three different social dilemmas: the prisoner's dilemma, the snowdrift game, and the stag-hunt game. For each case, we depict a realization of an individual-based simulation (upper panels) and a pairwise invasibility plot (lower panels). Each simulation is initialized at a monomorphic population with a payoff sensitivity of $\beta\!=\!0$. Over time, mutations introduce variation in the players' payoff sensitivity. Individuals obtain payoffs by randomly interacting with other population members and learning strategies through introspection dynamics. They reproduce according to their payoff. For details, see Material and Methods. Pairwise invasibility plots illustrate the dynamics of monomorphic resident populations. They display (in color) which rare mutant traits ($y$-axis) have a positive invasion fitness $s_x(y)$, given the current resident ($x$-axis). Colored regions above the diagonal indicate that mutants with a higher trait value than the resident are favored to invade. Colored regions below the diagonal indicate evolution towards smaller trait values. Dashed lines indicate the position of singular trait values. A,B, For the depicted prisoner's dilemma, we observe a dynamics of ever-increasing payoff sensitivities. C,D, In the snowdrift game, the evolving $\beta$ values converge to a finite value. E,F, In the stag-hunt game, we observe evolutionary branching. After this occurs, there are two subpopulations. Members of the first subpopulation have low $\beta$ values, and hence they choose strategies mostly at random. Members of the second subpopulation exhibit high $\beta$ values; they tend to best respond to their respective opponent.
  • Figure 4: Evolutionary dynamics across all social dilemmas. So far, we described three special instances of games; here we systematically repeat this analysis for all $2\!\times\!2$ games. For each pair of game parameters $A$ and $B$, we numerically check whether there exists a finite singular trait value $\beta^*$. If it exists, we explore whether the respective trait value is evolutionarily stable (orange) or a branching point (purple). The color gradient represents the position of the singular trait: the lighter, the larger the value of $\beta^*$. Regions without a finite singular trait are left white. Black symbols indicate the position of the three examples displayed in Fig. \ref{['Fig3']}. Note that here we report the lowest singular point (the one which is reached when the process starts at $\beta=0$). However, there are games that permit two singular points. We show those in SI Section 2E.