A Historical Interaction-Enhanced Shapley Policy Gradient Algorithm for Multi-Agent Credit Assignment
Ao Ding, Licheng Sun, Yongjie Hou, Huaqing Zhang, Hongbin Ma
TL;DR
This work tackles credit assignment in cooperative multi-agent reinforcement learning by introducing HIS, a hybrid mechanism that blends a global baseline with Shapley-based incentives derived from historical interactions. The approach provides theoretical guarantees of efficiency and core stability while employing Approximate Marginal Contribution to estimate Shapley values in a sample-efficient manner, supplemented by Box-Cox normalization to stabilize learning. Empirically, HIS outperforms state-of-the-art baselines across three continuous-domain benchmarks (MPE, Bi-DexHands, MAMuJoCo), especially in strongly coupled tasks, and ablations confirm the importance of both the hybrid payoff structure and history-based credit signals. These results underscore the practical impact of combining explicit Shapley-based credit with stable global rewards for robust, scalable multi-agent collaboration.
Abstract
Multi-agent reinforcement learning (MARL) has demonstrated remarkable performance in multi-agent collaboration problems and has become a prominent topic in artificial intelligence research in recent years. However, traditional credit assignment schemes in MARL cannot reliably capture individual contributions in strongly coupled tasks while maintaining training stability, which leads to limited generalization capabilities and hinders algorithm performance. To address these challenges, we propose a Historical Interaction-Enhanced Shapley Policy Gradient Algorithm (HIS) for Multi-Agent Credit Assignment, which employs a hybrid credit assignment mechanism to balance base rewards with individual contribution incentives. By utilizing historical interaction data to calculate the Shapley value in a sample-efficient manner, HIS enhances the agent's ability to perceive its own contribution, while retaining the global reward to maintain training stability. Additionally, we provide theoretical guarantees for the hybrid credit assignment mechanism, ensuring that the assignment results it generates are both efficient and stable. We evaluate the proposed algorithm in three widely used continuous-action benchmark environments: Multi-Agent Particle Environment, Multi-Agent MuJoCo, and Bi-DexHands. Experimental results demonstrate that HIS outperforms state-of-the-art methods, particularly excelling in strongly coupled, complex collaborative tasks.
