Table of Contents
Fetching ...

Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Xiaolin Ai

TL;DR

HCGL addresses the challenges of large-scale, interpretable cooperative MARL by introducing the Extensible Cooperation Graph (ECG), a three-layer graph that directly guides agent behavior through topology. Four graph operators dynamically rewire ECG, while a MAPPO-based trainer optimizes the operators and integrates hierarchical knowledge via primitive and cooperative actions. The framework yields strong performance on sparse-reward tasks, demonstrates notable transferability to larger scales through curriculum learning, and provides interpretable insights by visualizing ECG topology. These contributions enable scalable, knowledge-infused, and transferable multi-agent coordination with improved interpretability.

Abstract

Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.

Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

TL;DR

HCGL addresses the challenges of large-scale, interpretable cooperative MARL by introducing the Extensible Cooperation Graph (ECG), a three-layer graph that directly guides agent behavior through topology. Four graph operators dynamically rewire ECG, while a MAPPO-based trainer optimizes the operators and integrates hierarchical knowledge via primitive and cooperative actions. The framework yields strong performance on sparse-reward tasks, demonstrates notable transferability to larger scales through curriculum learning, and provides interpretable insights by visualizing ECG topology. These contributions enable scalable, knowledge-infused, and transferable multi-agent coordination with improved interpretability.

Abstract

Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.
Paper Structure (22 sections, 10 figures, 1 table)

This paper contains 22 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: An illustration of the Extensible Cooperation Graph (ECG). Starting from the bottom, ECG is a three-layer hierarchical graph structure that includes agent nodes, cluster nodes and target nodes. The graph nodes are connected by edges. Within each episode, this graph will be dynamically controlled by four virtual agents referred to as operators.
  • Figure 2: Training the policy network of graph operators.
  • Figure 3: ECG has excellent scaling capability and has significant advantages when used together with curriculum learning techniques.
  • Figure 4: Cooperative Swarm Interception (Initial State).
  • Figure 5: Cooperative Swarm Interception (Expelling invaders).
  • ...and 5 more figures