PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning
Ke Zhang, DanDan Zhu, Qiuhan Xu, Hao Zhou, Ce Zheng
TL;DR
The paper addresses slow convergence in multi-agent reinforcement learning caused by distribution drift among agents. It introduces three periodically parameter sharing variants—A-PPS, RS-PPS, and PP-PPS—that periodically share parts or whole components of the QMIX value network across agents, leveraging ideas from Federated Learning to mitigate non-IID exploration. Empirical results in StarCraft SMAC show average performance gains of 10–30% and enable solving tasks that QMIX cannot, with RS-PPS delivering the strongest results in high-dimensional scenarios. The proposed methods are compatible with existing value-function factorization approaches and offer a practical pathway to faster, more robust MARL training without sharing raw trajectories.
Abstract
Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical issue in multi-agent reinforcement learning is how to efficiently accelerate training process. To address this problem, current research has leveraged a centralized function(CF) across multiple agents to learn contribution of the team reward for each agent. However, CF based methods introduce joint error from other agents in estimation of value network. In so doing, inspired by federated learning, we propose three simple novel approaches called Average Periodically Parameter Sharing(A-PPS), Reward-Scalability Periodically Parameter Sharing(RS-PPS) and Partial Personalized Periodically Parameter Sharing(PP-PPS) mechanism to accelerate training of MARL. Agents share Q-value network periodically during the training process. Agents which has same identity adapt collected reward as scalability and update partial neural network during period to share different parameters. We apply our approaches in classical MARL method QMIX and evaluate our approaches on various tasks in StarCraft Multi-Agent Challenge(SMAC) environment. Performance of numerical experiments yield enormous enhancement, with an average improvement of 10\%-30\%, and enable to win tasks that QMIX cannot. Our code can be downloaded from https://github.com/ColaZhang22/PPS-QMIX
