Altruism and Fair Objective in Mixed-Motive Markov games
Yao-hua Franck Xu, Tayeb Lemlouma, Arnaud Braud, Jean-Marie Bonnin
TL;DR
This paper tackles fairness in mixed-motive Markov games where purely utilitarian objectives can yield highly unequal outcomes. It introduces a Proportional Fairness (PF) based fair altruistic utility and extends it to sequential decision making through a Fair Altruistic Markov Game, accompanied by novel fairActor-Critic algorithms. The authors derive a PF-informed policy gradient framework, including Fair MAA2C and Fair MAPPO instantiations, and validate them in the CleanUp social dilemma, demonstrating higher group welfare with substantially lower inequality than traditional utilitarian approaches. The work shows that balancing efficiency and equity in multi-agent learning can prevent rigid role specialization and promote stable, scalable cooperation with heterogeneous agents, offering practical implications for distributed AI systems.
Abstract
Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group's cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.
