Bottom-Up Reputation Promotes Cooperation with Multi-Agent Reinforcement Learning
Tianyu Ren, Xuan Yao, Yang Li, Xiao-Jun Zeng
TL;DR
The paper tackles cooperation in multi-agent reinforcement learning when reputations are privately formed. It introduces Learning with Reputation Reward (LR2), where each agent learns a dilemma policy $\pi^i$ for action selection and an evaluation policy $\eta^i$ to assign reputations, reshaping neighbor rewards via reputations with payoffs constrained to $R=1$, $P=0$, $0\le T\le 2$, $-1\le S\le 1$. Evaluations on spatial social dilemmas on a lattice show LR2 yields stronger cooperation and emergent strategy clustering, outperforming baselines and ablations. Key insights include LR2’s robustness to strong dilemmas, formation of cooperative clusters, and the benefit of balanced reputation-alignment rather than strict enforcement of consensus.
Abstract
Reputation serves as a powerful mechanism for promoting cooperation in multi-agent systems, as agents are more inclined to cooperate with those of good social standing. While existing multi-agent reinforcement learning methods typically rely on predefined social norms to assign reputations, the question of how a population reaches a consensus on judgement when agents hold private, independent views remains unresolved. In this paper, we propose a novel bottom-up reputation learning method, Learning with Reputation Reward (LR2), designed to promote cooperative behaviour through rewards shaping based on assigned reputation. Our agent architecture includes a dilemma policy that determines cooperation by considering the impact on neighbours, and an evaluation policy that assigns reputations to affect the actions of neighbours while optimizing self-objectives. It operates using local observations and interaction-based rewards, without relying on centralized modules or predefined norms. Our findings demonstrate the effectiveness and adaptability of LR2 across various spatial social dilemma scenarios. Interestingly, we find that LR2 stabilizes and enhances cooperation not only with reward reshaping from bottom-up reputation but also by fostering strategy clustering in structured populations, thereby creating environments conducive to sustained cooperation.
