Table of Contents
Fetching ...

Altruism and Fair Objective in Mixed-Motive Markov games

Yao-hua Franck Xu, Tayeb Lemlouma, Arnaud Braud, Jean-Marie Bonnin

TL;DR

This paper tackles fairness in mixed-motive Markov games where purely utilitarian objectives can yield highly unequal outcomes. It introduces a Proportional Fairness (PF) based fair altruistic utility and extends it to sequential decision making through a Fair Altruistic Markov Game, accompanied by novel fairActor-Critic algorithms. The authors derive a PF-informed policy gradient framework, including Fair MAA2C and Fair MAPPO instantiations, and validate them in the CleanUp social dilemma, demonstrating higher group welfare with substantially lower inequality than traditional utilitarian approaches. The work shows that balancing efficiency and equity in multi-agent learning can prevent rigid role specialization and promote stable, scalable cooperation with heterogeneous agents, offering practical implications for distributed AI systems.

Abstract

Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group's cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.

Altruism and Fair Objective in Mixed-Motive Markov games

TL;DR

This paper tackles fairness in mixed-motive Markov games where purely utilitarian objectives can yield highly unequal outcomes. It introduces a Proportional Fairness (PF) based fair altruistic utility and extends it to sequential decision making through a Fair Altruistic Markov Game, accompanied by novel fairActor-Critic algorithms. The authors derive a PF-informed policy gradient framework, including Fair MAA2C and Fair MAPPO instantiations, and validate them in the CleanUp social dilemma, demonstrating higher group welfare with substantially lower inequality than traditional utilitarian approaches. The work shows that balancing efficiency and equity in multi-agent learning can prevent rigid role specialization and promote stable, scalable cooperation with heterogeneous agents, offering practical implications for distributed AI systems.

Abstract

Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group's cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.
Paper Structure (27 sections, 5 theorems, 24 equations, 4 figures, 2 tables, 2 algorithms)

This paper contains 27 sections, 5 theorems, 24 equations, 4 figures, 2 tables, 2 algorithms.

Key Result

theorem 1

The altruism level of social dilemma is

Figures (4)

  • Figure 1: The CleanUp environment. Agents' observation window is represented in lighter tiles.
  • Figure 2: Comparison between Prop. Fairness and Util. Welfare objective trained with MAPPO in the fully cooperative setting ($\alpha=1$). Solid lines is obtained by averaging the metric over a rolling window of 50 runs. Shaded areas represent the min and the max of the respective metric over the same rolling window.
  • Figure 3: Performance of varying $\alpha$ in the CleanUp environment trained with Fair-MAPPO.
  • Figure 4: Performance of varying $\alpha$ in the CleanUp environment trained with Fair-MAA2C.

Theorems & Definitions (10)

  • definition 1
  • definition 2
  • theorem 1
  • definition 3
  • theorem 2
  • definition 4
  • definition 5
  • theorem 3
  • theorem 4
  • theorem 5