Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

Cheng Xu; Changtian Zhang; Yuchen Shi; Ran Wang; Shihong Duan; Yadong Wan; Xiaotong Zhang

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang

TL;DR

This work addresses stability, exploration, and credit-assignment challenges in multi-agent reinforcement learning by introducing GMAH, a subgoal-based hierarchical framework where each agent uses a high-level policy to select subgoals from a task-tree and a low-level policy guided by intrinsic rewards. An adaptive goal-generation strategy, powered by a lightweight auto-encoder and successor-feature like signals, enables dynamic subgoal updates in response to environmental changes. A novel goal-mixing network extends the hierarchy to multi-agent settings by combining high-level subgoals into a joint value function $Q_{tot}$ under a monotonic constraint with respect to individual $Q_i$, following a QMIX-style mixer. Empirical results on Mini-Grid and Trash-Grid show faster convergence and richer exploration compared with baselines such as PPO, A2C, MAPPO, and QMIX, demonstrating the approach’s effectiveness and flexibility for complex cooperative tasks. The open-source implementation and discussions of limitations and future work provide a practical pathway for deploying hierarchical, cooperative RL in real-world multi-agent systems.

Abstract

Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simpler sub-tasks, which is promising for multi-agent settings. This paper advances the field by introducing a hierarchical architecture that autonomously generates effective subgoals without explicit constraints, enhancing both flexibility and stability in training. We propose a dynamic goal generation strategy that adapts based on environmental changes. This method significantly improves the adaptability and sample efficiency of the learning process. Furthermore, we address the critical issue of credit assignment in multi-agent systems by synergizing our hierarchical architecture with a modified QMIX network, thus improving overall strategy coordination and efficiency. Comparative experiments with mainstream reinforcement learning algorithms demonstrate the superior convergence speed and performance of our approach in both single-agent and multi-agent environments, confirming its effectiveness and flexibility in complex scenarios. Our code is open-sourced at: \url{https://github.com/SICC-Group/GMAH}.

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

TL;DR

under a monotonic constraint with respect to individual

, following a QMIX-style mixer. Empirical results on Mini-Grid and Trash-Grid show faster convergence and richer exploration compared with baselines such as PPO, A2C, MAPPO, and QMIX, demonstrating the approach’s effectiveness and flexibility for complex cooperative tasks. The open-source implementation and discussions of limitations and future work provide a practical pathway for deploying hierarchical, cooperative RL in real-world multi-agent systems.

Abstract

Paper Structure (23 sections, 9 equations, 17 figures, 1 algorithm)

This paper contains 23 sections, 9 equations, 17 figures, 1 algorithm.

Introduction
Related Work
Multi-agent Reinforcement Learning
Hierarchical Reinforcement Learning
Subgoal-based HRL
The General Architecture
Task-Tree Style Subgoal Generation
Adaptive Goal Generation Strategy
Flexible Update Intervals
Proactive Goal Updating
Fine-Tuning of Goal Mixing Network
Experiments and Discussion
Mini-Grid: Single Agent
Environmental Setup
Comparative Experiments
...and 8 more sections

Figures (17)

Figure 1: The overall GMAH structure.
Figure 2: A typical diagram of task tree.
Figure 3: (a) High-Policy Network Model. (b) Hierarchical architecture applied by single agent in GMAH algorithm. (c) Network Model of GMAH.
Figure 4: (a) Trajectory example of agent c-step interaction. (b) Adjacency Constraint on Goal Space of HRAC. Goal Relabel on Trajectory of agent c-step interaction: (c) relabel the abstract subgoal, and (d) relabel subgoal of GMAH.
Figure 5: Auto-Encoder with Successor Feature Correction.
...and 12 more figures

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

TL;DR

Abstract

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (17)