Table of Contents
Fetching ...

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Pu Feng, Junkang Liang, Size Wang, Xin Yu, Xin Ji, Yiting Chen, Kui Zhang, Rongye Shi, Wenjun Wu

TL;DR

The paper tackles the CTDE gap in multi-agent cooperation by deriving a global consensus from local observations through contrastive learning, enabling coordinated behavior without explicit inter-agent communication. It introduces HC-MARL, a framework comprising a Consensus Builder and a Hierarchical Consensus Mechanism that generates short-term and long-term consensus layers, dynamically weighted by an adaptive attention mechanism. The approach is integrated into MAPPO-like architectures and validated through three cooperative tasks in simulation and real-world E-puck experiments, showing improved efficiency and task performance over baselines. Ablation studies identify optimal numbers of consensus categories and layers, and real-world tests confirm practical viability, underscoring HC-MARL’s potential for scalable, communication-free coordination in robotic teams.

Abstract

In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

TL;DR

The paper tackles the CTDE gap in multi-agent cooperation by deriving a global consensus from local observations through contrastive learning, enabling coordinated behavior without explicit inter-agent communication. It introduces HC-MARL, a framework comprising a Consensus Builder and a Hierarchical Consensus Mechanism that generates short-term and long-term consensus layers, dynamically weighted by an adaptive attention mechanism. The approach is integrated into MAPPO-like architectures and validated through three cooperative tasks in simulation and real-world E-puck experiments, showing improved efficiency and task performance over baselines. Ablation studies identify optimal numbers of consensus categories and layers, and real-world tests confirm practical viability, underscoring HC-MARL’s potential for scalable, communication-free coordination in robotic teams.

Abstract

In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.
Paper Structure (20 sections, 10 equations, 10 figures, 1 table)

This paper contains 20 sections, 10 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: The relationship between the environmental state and local observations in the CTDE framework. Despite differing local observations, they all correspond to the same environmental state at each timestep, providing diverse perspectives of a unified global state. In traditional CTDE approaches, agents rely solely on these local observations for decision-making during execution.
  • Figure 2: Importance of Dynamic State Information: The diagram illustrates agents as green triangles and neighbors as blue triangles. The orientation of agents is indicated by the vertical position of the triangles, and their motion direction is shown by the arrows. The left side displays the environmental state, while the right side shows information usable in execution within CTDE. Static environmental information provides position and orientation, whereas dynamic information additionally offers speed data.
  • Figure 3: An overview of the Hierarchical Consensus Mechanism. $x^m_i$ and $x^m_j$ represent different local observations from the same environmental state for the $m$-th layer, which are used to derive a global consensus classification through the teacher-student network. Consensus from different layers is aggregated into an attention-weighted consensus through multi-head attention.
  • Figure 4: Overview of the HC-MARL framework. Sequentially, from left to right: Agents initially acquire local observations from the environment. These observations are subsequently processed by the hierarchical consensus builder, yielding the current consensus class. This derived consensus, denoted as $c^{att}_i$, enriches the agents' observational or state data. It is then incorporated into both policy and critic networks, thereby steering agent actions in alignment with the collectively determined global consensus.
  • Figure 5: Learning curves of the HC-MARL, MAPPO, HAPPO on the Predator-Prey task. Each experiment was executed 5 times with different random seeds.
  • ...and 5 more figures