Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning
Tianyu Ren, Xiao-Jun Zeng
TL;DR
This paper tackles the emergence of cooperation in social dilemmas by enabling agents to learn both dilemma strategies and neighbour selection within a spatial Prisoner’s Dilemma using multi‑agent reinforcement learning. It introduces a dual Q‑network MARL framework that leverages long‑term experiences to differentiate cooperative from non‑cooperative neighbours, promoting network reciprocity and clustering of strategies. Empirical results show superior cooperation and payoffs compared with evolutionary game theory baselines, with cooperation remaining robust up to a dilemma strength of $b\approx 1.2$ and improving with longer memory. The work provides a scalable, explicit‑network framework for studying the coevolution of cooperation and interaction, with implications for designing cooperative artificial and human‑AI systems.
Abstract
The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner's Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.
