A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Guozhong Zheng; Xin Ou; Shengfeng Deng; Jiqiang Zhang; Li Chen

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Guozhong Zheng, Xin Ou, Shengfeng Deng, Jiqiang Zhang, Li Chen

TL;DR

The paper addresses the mismatch between imitation-based evolutionary game models and observed behaviors by advocating reinforcement learning (RL) as a unifying, introspective learning paradigm. It surveys how RL—through mechanisms like Q-learning and policy-based methods—can explain the emergence of cooperation, trust, fairness, and efficient resource allocation, as well as ecological coexistence, by optimizing long-term payoffs rather than copying successful peers. Key contributions include demonstrating RL-driven cooperation in pairwise and multi-agent games, endogenous trust dynamics, fairness generation via historical experience and foresight, and improved resource coordination with phase-transition-like behavior, plus RL-based ecological insights on predator–prey and biodiversity. The findings suggest RL provides a cohesive framework for understanding complex social and ecological phenomena, though empirical validation with human and animal behavior remains essential for a full theoretical synthesis.

Abstract

Cooperation, fairness, trust, and resource coordination are cornerstones of modern civilization, yet their emergence remains inadequately explained by the persistent discrepancies between theoretical predictions and behavioral experiments. Part of this gap may arise from the imitation learning paradigm commonly used in prior theoretical models, which assumes individuals merely copy successful neighbors according to predetermined, fixed rules. This review examines recent advances in evolutionary game dynamics that employ reinforcement learning (RL) as an alternative paradigm. In RL, individuals learn through trial and error and introspectively refine their strategies based on environmental feedback. We begin by introducing key concepts in evolutionary game theory and the two learning paradigms, then synthesize progress in applying RL to elucidate cooperation, trust, fairness, optimal resource coordination, and ecological dynamics. Collectively, these studies indicate that RL offers a promising unified framework for understanding the diverse social and ecological phenomena observed in human and natural systems.

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 7 figures)

This paper contains 13 sections, 3 equations, 7 figures.

Introduction
Fundamentals
Evolutionary game theory
The paradigm of imitation learning
The paradigm of reinforcement learning
Cooperation
The pairwise game
The multi-player game
Trust
Fairness
Resource allocation
Ecological systems
Concluding remarks

Figures (7)

Figure 1: Two paradigms for game evolution. In imitation learning, players compare the rewards of their neighbors as the utility and adopt the strategy of the neighbors who have higher utilities. Instead, players with reinforcement learning score different actions and probabilistically choose an action each time based on these scores, which are continuously revised according to the outcome.
Figure 2: Emergence of cooperation in the prisoner’s dilemma game. (a) The phase diagram of cooperation level within the space of learning parameters $(\alpha, \gamma)$ when two players play the game, which can be divided into three regions: high cooperation (I), full defection (II), and low cooperation (III). (b) The corresponding reward difference between two players, which is visible around the boundaries between Regions I--II and I--III. $b = 0.2$ in Eq. \ref{['eq:PDG_payoff']} and $\epsilon = 0.01$ (Adapted from ding2023emergence).
Figure 3: Schematics of three model setups. (Left) Public goods game (PGG) with a Fermi-function update rule. (Middle) PGG with Q-learning. (Right) Volunteer public goods game (VPGG) with Q-learning. The focal individuals are indicated by gray and are surrounded by neighbors that could be cooperators (red), defectors (blue), and loners (pink). Below each panel are the corresponding strategy-update components: the Fermi function (left) and the Q-tables (middle and right). (Adapted from zheng2024evolution).
Figure 4: Emergence of trust. Fractions of four strategies in the trust game within the parameter space $(\gamma, \alpha)\in (0,1)$. For instance, TB denotes a player who trusts as a trustor but betrays as a trustee. High levels of trust (TR) emerge at large $\gamma$ and small $\alpha$ (bottom-right corner), indicating that appreciating both historical experience and long-term vision promotes trust and trustworthiness. (Adapted from Zheng2024Decoding).
Figure 5: Emergence of fairness. As in many practices of behavioral experiments, proposers $p$ are offered with three options: mean $(l<0.5)$, fair $(m=0.5)$, and overgenerous $(h>0.5)$, and the responders $q$ have the same acceptance threshold $\{l, m, h\}$. The two subplots show different dependencies of fairness on the l and h. While the rational option fractions $p_l$ and $q_l$ rise as $l$ increases, the marginal impact of $h$ is observed. In all cases, the densities of overgenerous options are always vanishing. Parameters: $\epsilon=0.01$, $\alpha=0.1$, $\gamma=0.9$, $h=0.8$ in (a) and $l=0.3$ in (b). (Adapted from Zheng2025Decoding).
...and 2 more figures

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

TL;DR

Abstract

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Authors

TL;DR

Abstract

Table of Contents

Figures (7)