Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity
Martin Smit, Fernando P. Santos
TL;DR
This work investigates how indirect reciprocity and reputational norms can sustain cooperation and fairness in a population split into two groups, using a donation game with $b>c>0$. It jointly analyzes an analytical evolutionary game-theory framework to identify stable norm–strategy configurations (NSS) and a learning-based approach with independent Q-learning agents to assess convergence to those equilibria. Key findings show that a defecting majority can induce minority defection, while in-group and out-group norms can steer systems toward fair or unfair cooperation; however, convergence in RL is sensitive to norm choice, $b/c$ ratio, and initial Q-values, with norms like SternJudging providing robust outcomes under certain conditions. The results highlight that, in heterogeneous populations with reputations, selecting interaction norms is crucial to address both cooperation and fairness, offering guidance for designing fair multi-agent systems and informing norm-emergence research.
Abstract
Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
