Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration
Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang
TL;DR
This work addresses stability, exploration, and credit-assignment challenges in multi-agent reinforcement learning by introducing GMAH, a subgoal-based hierarchical framework where each agent uses a high-level policy to select subgoals from a task-tree and a low-level policy guided by intrinsic rewards. An adaptive goal-generation strategy, powered by a lightweight auto-encoder and successor-feature like signals, enables dynamic subgoal updates in response to environmental changes. A novel goal-mixing network extends the hierarchy to multi-agent settings by combining high-level subgoals into a joint value function $Q_{tot}$ under a monotonic constraint with respect to individual $Q_i$, following a QMIX-style mixer. Empirical results on Mini-Grid and Trash-Grid show faster convergence and richer exploration compared with baselines such as PPO, A2C, MAPPO, and QMIX, demonstrating the approach’s effectiveness and flexibility for complex cooperative tasks. The open-source implementation and discussions of limitations and future work provide a practical pathway for deploying hierarchical, cooperative RL in real-world multi-agent systems.
Abstract
Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simpler sub-tasks, which is promising for multi-agent settings. This paper advances the field by introducing a hierarchical architecture that autonomously generates effective subgoals without explicit constraints, enhancing both flexibility and stability in training. We propose a dynamic goal generation strategy that adapts based on environmental changes. This method significantly improves the adaptability and sample efficiency of the learning process. Furthermore, we address the critical issue of credit assignment in multi-agent systems by synergizing our hierarchical architecture with a modified QMIX network, thus improving overall strategy coordination and efficiency. Comparative experiments with mainstream reinforcement learning algorithms demonstrate the superior convergence speed and performance of our approach in both single-agent and multi-agent environments, confirming its effectiveness and flexibility in complex scenarios. Our code is open-sourced at: \url{https://github.com/SICC-Group/GMAH}.
