GHQ: Grouped Hybrid Q Learning for Heterogeneous Cooperative Multi-agent Reinforcement Learning
Xiaoyang Yu, Youfang Lin, Xiangsen Wang, Sheng Han, Kai Lv
TL;DR
This work formalizes Local Transition Heterogeneity (LTH) in cooperative heterogeneous MARL and introduces Grouped Hybrid Q-Learning (GHQ) to address it. GHQ combines Grouped IGM Consistency (GIGM), Ideal Object Grouping (IOG), and Inter-Group Mutual Information (IGMI) to coordinate multiple agent groups with group-specific networks and a hybrid value-factorization. Empirical results on original and newly designed asymmetric SMAC maps show GHQ outperforms state-of-the-art baselines, with reduced variance and clearer inter-group coordination, highlighting the importance of heterogeneity-aware grouping. The approach offers a principled framework for tackling LTH in MARL and suggests avenues for scaling to larger, more complex heterogeneous environments.
Abstract
Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in homogeneous scenarios. However, heterogeneous scenarios are also very common and usually harder to solve. In this paper, we mainly discuss cooperative heterogeneous MARL problems in Starcraft Multi-Agent Challenges (SMAC) environment. We firstly define and describe the heterogeneous problems in SMAC. In order to comprehensively reveal and study the problem, we make new maps added to the original SMAC maps. We find that baseline algorithms fail to perform well in those heterogeneous maps. To address this issue, we propose the Grouped Individual-Global-Max Consistency (GIGM) and a novel MARL algorithm, Grouped Hybrid Q Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group, along with a novel hybrid structure for factorization. To enhance coordination between groups, we maximize the Inter-group Mutual Information (IGMI) between groups' trajectories. Experiments on original and new heterogeneous maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.
