Table of Contents
Fetching ...

Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

Alper Demir, Hüseyin Aydın, Kale-ab Abebe Tessera, David Abel, Stefano V. Albrecht

TL;DR

This work introduces asymmetric variants of well-known SSD environments and examines how natural differences between agents influence cooperation dynamics, and proposes three modifications that foster faster emergence of cooperative policies compared to existing approaches without sacrificing scalability or practicality.

Abstract

Sequential Social Dilemmas (SSDs) provide a key framework for studying how cooperation emerges when individual incentives conflict with collective welfare. In Multi-Agent Reinforcement Learning, these problems are often addressed by incorporating intrinsic drives that encourage prosocial or fair behavior. However, most existing methods assume that agents face identical incentives in the dilemma and require continuous access to global information about other agents to assess fairness. In this work, we introduce asymmetric variants of well-known SSD environments and examine how natural differences between agents influence cooperation dynamics. Our findings reveal that existing fairness-based methods struggle to adapt under asymmetric conditions by enforcing raw equality that wrongfully incentivize defection. To address this, we propose three modifications: (i) redefining fairness by accounting for agents' reward ranges, (ii) introducing an agent-based weighting mechanism to better handle inherent asymmetries, and (iii) localizing social feedback to make the methods effective under partial observability without requiring global information sharing. Experimental results show that in asymmetric scenarios, our method fosters faster emergence of cooperative policies compared to existing approaches, without sacrificing scalability or practicality.

Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

TL;DR

This work introduces asymmetric variants of well-known SSD environments and examines how natural differences between agents influence cooperation dynamics, and proposes three modifications that foster faster emergence of cooperative policies compared to existing approaches without sacrificing scalability or practicality.

Abstract

Sequential Social Dilemmas (SSDs) provide a key framework for studying how cooperation emerges when individual incentives conflict with collective welfare. In Multi-Agent Reinforcement Learning, these problems are often addressed by incorporating intrinsic drives that encourage prosocial or fair behavior. However, most existing methods assume that agents face identical incentives in the dilemma and require continuous access to global information about other agents to assess fairness. In this work, we introduce asymmetric variants of well-known SSD environments and examine how natural differences between agents influence cooperation dynamics. Our findings reveal that existing fairness-based methods struggle to adapt under asymmetric conditions by enforcing raw equality that wrongfully incentivize defection. To address this, we propose three modifications: (i) redefining fairness by accounting for agents' reward ranges, (ii) introducing an agent-based weighting mechanism to better handle inherent asymmetries, and (iii) localizing social feedback to make the methods effective under partial observability without requiring global information sharing. Experimental results show that in asymmetric scenarios, our method fosters faster emergence of cooperative policies compared to existing approaches, without sacrificing scalability or practicality.
Paper Structure (22 sections, 10 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 10 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Schelling diagrams for example asymmetric Prisoner's Dilemma games from the literature, between agent $i$ (in blue) and agent $j$ (in red). For each agent, defective strategies lead to higher rewards.
  • Figure 2: (a) Coins environment with 2 standard (1 red and 1 blue) agents; (b) Harvest environment with 10 standard agents.
  • Figure 3: Performance of IA, SVO, and their Fair&Local versions in Coins with asymmetry in coin rewards. Fair&LocalIA and Fair&LocalSVO shows faster and more stable convergence to prosocial behavior with higher returns than their counterparts.
  • Figure 4: Performance of IA, SVO, and their Fair&Local versions in Harvest environment with asymmetry in apple rewards. Fair&Local versions lead to higher returns and better sustainability by eliminating the gap in peace between agent types.
  • Figure 5: Performance of IA, SVO, and their Fair&Local versions in Coins environment with asymmetry in coin spawn. The proposed approach shows faster convergence to more cooperative policies.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Definition 1: Sequential Social Dilemma
  • Definition 2: Asymmetric Sequential Social Dilemma